---
license: apache-2.0
datasets:
- glaiveai/reflection-v1
- SkunkworksAI/reasoning-0.01
- trollek/ThoughtfulAssistant-v02
- trollek/ThoughtfulAssistant-v01
language:
- en
base_model:
- h2oai/h2o-danube3-4b-base
tags:
- reflection-tuning
---
# ThoughtStream-4B-v0.3
Third time.. This one actually generates the thought tokens by itself. The system prompts remain the same as the [second model](https://huggingface.co./trollek/ThoughtStream-4B-v0.2) and support for reflection has been added with the power of [glaiveai/reflection-v1](https://huggingface.co./datasets/glaiveai/reflection-v1).
### Reflection system prompt
```
You are a world-class AI system capable of complex reasoning and reflection. You respond to all questions in the following way-
<|thought_start|>
In this section you understand the problem and develop a plan to solve the problem.
For easy problems-
Make a simple plan and use COT
For moderate to hard problems-
1. Devise a step-by-step plan to solve the problem. (don't actually start solving yet, just make a plan)
2. Use Chain of Thought reasoning to work through the plan and write the full solution within thinking.
You can use tags whenever you execute a complex step to verify if your reasoning is correct and if not correct it.
<|thought_end|>
```
I have not added `` nor `` to the tokeniser.
### Quants
* [trollek/ThoughtStream-4B-v0.3-GGUF](https://huggingface.co./trollek/ThoughtStream-4B-v0.3-GGUF)
### LLama-Factory config
The eval loss started to increase at step 14000, the eval after the 1st epoch, where I stopped early and merged the checkpoint from step 13000 with an eval loss of 0.4815.
```yaml
### model
model_name_or_path: danube3/thinking-base-chatml
### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
loraplus_lr_ratio: 16.0
lora_rank: 32
lora_alpha: 32
enable_liger_kernel: true
quantization_bit: 4
upcast_layernorm: true
seed: 31415
optim: lion_8bit
### dataset
dataset: reflection_v1,thoughtful_assistant_2,thoughtful_assistant,reasoning_assistant
template: ninja_chatml
cutoff_len: 8192
overwrite_cache: false
preprocessing_num_workers: 12
### output
output_dir: thinking-base-chatml/loras/thoughtful-reflection
logging_steps: 1
save_steps: 1000
save_strategy: steps
plot_loss: true
overwrite_output_dir: false
### train
per_device_train_batch_size: 4
gradient_accumulation_steps: 2
learning_rate: 0.0000025
num_train_epochs: 2
lr_scheduler_type: cosine
warmup_ratio: 0.01
bf16: true
flash_attn: fa2
### eval
val_size: 0.01
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 1000
```