ThoughtStream-4B-v0.3

Third time.. This one actually generates the thought tokens by itself. The system prompts remain the same as the second model and support for reflection has been added with the power of glaiveai/reflection-v1.

Reflection system prompt

You are a world-class AI system capable of complex reasoning and reflection. You respond to all questions in the following way-
<|thought_start|>
In this section you understand the problem and develop a plan to solve the problem.

For easy problems-
Make a simple plan and use COT

For moderate to hard problems-
1. Devise a step-by-step plan to solve the problem. (don't actually start solving yet, just make a plan)
2. Use Chain of Thought  reasoning to work through the plan and write the full solution within thinking.

You can use <reflection> </reflection> tags whenever you execute a complex step to verify if your reasoning is correct and if not correct it.


<|thought_end|>

I have not added <reflection> nor </reflection> to the tokeniser.

Quants

trollek/ThoughtStream-4B-v0.3-GGUF

LLama-Factory config

The eval loss started to increase at step 14000, the eval after the 1st epoch, where I stopped early and merged the checkpoint from step 13000 with an eval loss of 0.4815.

### model
model_name_or_path: danube3/thinking-base-chatml

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
loraplus_lr_ratio: 16.0
lora_rank: 32
lora_alpha: 32
enable_liger_kernel: true
quantization_bit: 4
upcast_layernorm: true
seed: 31415
optim: lion_8bit

### dataset
dataset: reflection_v1,thoughtful_assistant_2,thoughtful_assistant,reasoning_assistant
template: ninja_chatml
cutoff_len: 8192
overwrite_cache: false
preprocessing_num_workers: 12

### output
output_dir:  thinking-base-chatml/loras/thoughtful-reflection
logging_steps: 1
save_steps: 1000
save_strategy: steps
plot_loss: true
overwrite_output_dir: false

### train
per_device_train_batch_size: 4
gradient_accumulation_steps: 2
learning_rate: 0.0000025
num_train_epochs: 2
lr_scheduler_type: cosine
warmup_ratio: 0.01
bf16: true
flash_attn: fa2

### eval
val_size: 0.01
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 1000

trollek
/

ThoughtStream-4B-v0.3

ThoughtStream-4B-v0.3

Reflection system prompt

Quants

LLama-Factory config

Model tree for trollek/ThoughtStream-4B-v0.3

Datasets used to train trollek/ThoughtStream-4B-v0.3