File size: 2,222 Bytes

3ba96b5
f4f7cbc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3ba96b5
 
 
f4f7cbc
3ba96b5
f4f7cbc
3ba96b5
f4f7cbc
3ba96b5
f4f7cbc
3ba96b5
f4f7cbc
 
 
 
3ba96b5
 
f4f7cbc
3ba96b5
f4f7cbc
 
 
 
 
 
 
3ba96b5
f4f7cbc
3ba96b5
f4f7cbc
 
 
3ba96b5
f4f7cbc
3ba96b5
f4f7cbc
3ba96b5
f4f7cbc
3ba96b5
f4f7cbc
 
 
 
 
 
 
3ba96b5
f4f7cbc
3ba96b5

---
license: other
license_name: jamba-open-model-license
license_link: https://www.ai21.com/licenses/jamba-open-model-license
language:
- en
- fr
- de
- nl
- es
- pt
- it
- ar
- he
pipeline_tag: text-generation
tags:
- mamba
- jamba
- moe
library_name: transformers
---

# Spellbound Jamba Mini: Creative output over long contexts

  <img src="https://i.imgur.com/IG4IKfV.png" width=500 height=500>

# Main Goals

### The main goals of the base model choice and post-trained regime are

- Strong steerability
- Coherence over long context lengths
- Flexible writing styles
- Advanced formatting that allows identifying individual speakers


### There was also a secondary training objective: to teach the model to understand and produce directives in XML tags.

- `<${characterName}Description>`: A definition of a character defined as a markdown list of details. For example:
    - Name: Character Name
    - Personality: Character Personality
    - Speaker ID: 32AN4R (see `<quote>` tag below)
    - ...
- `<writingInstructions>`: A block of markdown formatted instructions representing what should happen in the story.
- `<pastStory>`: A block containing the preceeding events to the story being written

### Output can optionally include the following tags:

- `<quote speaker="{speakerId}">`: When a character is defined with a speaker ID, the model will output the speech surrounded by `<quote speaker="{speakerId}">` and `</quote>`. The model learns to keep speech in character this way, and it allows for identifying different speakers for rendering and text-to-speech purposes
- `<action>`: Represents an action taken by a character
- `<sound>`: Represents a sound made in the story

**Instructing the model to produce these tags is optional**, but the model should produce best possible output if the frontend being used can parse/ignore these

# Post-training Details

## Post-training consists of 1 epoch of SFT LORA training 

- Trained on synthetic instructions for strong steerability
- Outputs rated by [tryspellbound.com](https://tryspellbound.com) beta users who opted-in
- Lora Rank: 8
- Batch Size: 2
- Learning Rate: 1e-5
  
# Model Creator

Made by [tryspellbound.com](https://tryspellbound.com).