Edit model card

Bloom-1b7-creative-writing-IT

This model is a fine-tuned version of bigscience/bloom-1b7 on an a creative writing - short story dataset.

https://huggingface.co./datasets/adambjorn/UnrelatedForgettingOverhead/viewer/creative

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

Training and evaluation data here: https://huggingface.co./datasets/adambjorn/UnrelatedForgettingOverhead/viewer/creative

Training procedure

The model was instruction tuned on the dataset in the following way:

Given the set of promts:

prompts = [
    "Write a creative short story based on the following title:",
    "Here is a title for a story. Craft a short narrative around it:",
    "Using the title given, develop a short story:",
    "Imagine a short story that starts with this title:",
    "Create a brief story with the following title:"
]

each training example is generated by concatenating one of the prompts with the 'title' and 'selftext' in the following way:

concatenated_texts = [random.choice(prompts) + " " + title + "</s>" + "Story: " + selftext for title, selftext in zip(titles, selftexts)]

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Final reported loss: {'loss': 0.0135, 'grad_norm': 0.6041152477264404, 'learning_rate': 7.446808510638299e-07, 'epoch': 9.89}

Average over tuning: {'train_runtime': 1111.4187, 'train_samples_per_second': 1.71, 'train_steps_per_second': 0.423, 'train_loss': 0.4682149670225509, 'epoch': 9.89}

Framework versions

  • Transformers 4.38.1
  • Pytorch 2.2.0+cu121
  • Datasets 2.17.0
  • Tokenizers 0.15.2
Downloads last month
16
Safetensors
Model size
1.72B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for alonzogarbanzo/Bloom-1b7-creative-writing-IT-baseline

Finetuned
(9)
this model