---
license: apache-2.0
datasets:
- glaiveai/reflection-v1
- SkunkworksAI/reasoning-0.01
- trollek/ThoughtfulAssistant-v02
- trollek/ThoughtfulAssistant-v01
language:
- en
base_model:
- h2oai/h2o-danube3-4b-base
tags:
- reflection-tuning
---
# ThoughtStream-4B-v0.3

Third time.. This one actually generates the thought tokens by itself. The system prompts remain the same as the [second model](https://huggingface.co./trollek/ThoughtStream-4B-v0.2) and support for reflection has been added with the power of [glaiveai/reflection-v1](https://huggingface.co./datasets/glaiveai/reflection-v1).

### Reflection system prompt

```
You are a world-class AI system capable of complex reasoning and reflection. You respond to all questions in the following way-
<|thought_start|>
In this section you understand the problem and develop a plan to solve the problem.

For easy problems-
Make a simple plan and use COT

For moderate to hard problems-
1. Devise a step-by-step plan to solve the problem. (don't actually start solving yet, just make a plan)
2. Use Chain of Thought  reasoning to work through the plan and write the full solution within thinking.

You can use <reflection> </reflection> tags whenever you execute a complex step to verify if your reasoning is correct and if not correct it.


<|thought_end|>
```

I have not added `<reflection>` nor `</reflection>` to the tokeniser.

### Quants

* [trollek/ThoughtStream-4B-v0.3-GGUF](https://huggingface.co./trollek/ThoughtStream-4B-v0.3-GGUF)

### LLama-Factory config

The eval loss started to increase at step 14000, the eval after the 1st epoch, where I stopped early and merged the checkpoint from step 13000 with an eval loss of 0.4815.

```yaml
### model
model_name_or_path: danube3/thinking-base-chatml

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
loraplus_lr_ratio: 16.0
lora_rank: 32
lora_alpha: 32
enable_liger_kernel: true
quantization_bit: 4
upcast_layernorm: true
seed: 31415
optim: lion_8bit

### dataset
dataset: reflection_v1,thoughtful_assistant_2,thoughtful_assistant,reasoning_assistant
template: ninja_chatml
cutoff_len: 8192
overwrite_cache: false
preprocessing_num_workers: 12

### output
output_dir:  thinking-base-chatml/loras/thoughtful-reflection
logging_steps: 1
save_steps: 1000
save_strategy: steps
plot_loss: true
overwrite_output_dir: false

### train
per_device_train_batch_size: 4
gradient_accumulation_steps: 2
learning_rate: 0.0000025
num_train_epochs: 2
lr_scheduler_type: cosine
warmup_ratio: 0.01
bf16: true
flash_attn: fa2

### eval
val_size: 0.01
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 1000
```