lightblue
/

Jamba-v0.1-chat-multilingual

@@ -3,13 +3,11 @@ library_name: transformers
 tags: []
 ---
-Jamba Instruct (multilingual)
-## Model Details
-This model was trained as a small-scale experiment to determine how easy it is to fine-tune ai21labs' Jamba v0.1 to work as a chatbot.
-The aim of this experiment was to find how intelligently and reliably Jamba can chat in both English and other languages if only finetuned for a few hours.
 Initial subjective testing has shown that this model can chat reasonably well in both English and Japanese, so feel free to give it a try!
@@ -39,195 +37,145 @@ print(tokenizer.batch_decode([outputs[0][len(input_ids[0]):]]))
 # ['汗が出ることは、運動をするときに体温が上がり、体内の熱を外部に放出するための自然なメカニズムです。汗が出ることが多いことは、一般的には、体の温度調節機能が働いていることを意味します。しかし、汗が出ることが多すぎると、不快感や汗症などの問題が発生することがあります。以下に、汗が出ることが多い場合の対策を紹介します。\n\n1. 適切な服装を選ぶ: 汗が出ることが多い場合、軽量で透湿性の高い服を選ぶことが重要です。これにより、汗が体から外部に�']
 ```
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 tags: []
 ---
+# Model Overview
+This model was trained as a small-scale experiment to determine how easy it is to fine-tune [ai21labs/Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1) to work as a chatbot.
+The aim of this experiment was to find how intelligently and reliably Jamba can chat in both English and other languages if only QLoRA finetuned for a few hours.
 Initial subjective testing has shown that this model can chat reasonably well in both English and Japanese, so feel free to give it a try!
 # ['汗が出ることは、運動をするときに体温が上がり、体内の熱を外部に放出するための自然なメカニズムです。汗が出ることが多いことは、一般的には、体の温度調節機能が働いていることを意味します。しかし、汗が出ることが多すぎると、不快感や汗症などの問題が発生することがあります。以下に、汗が出ることが多い場合の対策を紹介します。\n\n1. 適切な服装を選ぶ: 汗が出ることが多い場合、軽量で透湿性の高い服を選ぶことが重要です。これにより、汗が体から外部に�']
 ```
+# Initial testing results
+# Training details
+The model was trained on 2 open source datasets (one multilingual) for one epoch on a A100 (80GB) x 4 environment for 3 hours.
+## Training data
+* [jondurbin/airoboros-3.2](https://huggingface.co/datasets/jondurbin/airoboros-3.2)
+A ~59K example dataset of curated LLM tasks in English, primarily generated with GPT-4. This dataset has been used by some of the best performing open source LLMs in the world (e.g. [jondurbin/bagel-7b-v0.4](https://huggingface.co/jondurbin/bagel-7b-v0.4), [NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO)) and contains a wide variety of tasks, so we hypothesized that this would lead to a multi-talented, accurate model. For this reason we chose this dataset was chosen for the bulk of our training data.
+Note: Each element in jondurbin/airoboros-3.2 already contains a system message.
+* [openchat/openchat_sharegpt4_dataset](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset) (GPT-4 responses only)
+A ~6K example dataset of multilingual multi-turn chats between users and GPT-4. While jondurbin/airoboros-3.2 has deilvered good results for models previously, it sadly contains no (or seemingly very little) multilingual data. We are a Japanese AI company, so require an LLM to be able to output in Japanese too. Hence we also selected a small, seemingly high quality dataset of GPT-4 responses in many languages from the ShareGPT dataset. We chose to only select the GPT-4 responses as we wanted to keep our dataset as small and high quality as possible to maximise the efficiency of our training.
+Note: openchat/openchat_sharegpt4_dataset does not contain system messages, so we added 'You are GPT-4, a helpful assistant.' as our system message.
+<details>
+  <summary>Data preparation code</summary>
+```python
+import os
+import pandas as pd
+from datasets import load_dataset, Dataset, concatenate_datasets
+os.environ['HF_HOME'] = "/workspace/hf_home"
+os.environ['HF_HUB_ENABLE_HF_TRANSFER'] = "1"
+boros_dataset = load_dataset("jondurbin/airoboros-3.2", split='train')
+gpt4_df = pd.read_json("https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/resolve/main/sharegpt_gpt4.json?download=true")
+gpt4_df["conversations"] = gpt4_df["items"].apply(lambda x: [{'from': 'system', 'value': 'You are GPT-4, a helpful assistant.'}] + x)
+gpt4_dataset = Dataset.from_pandas(gpt4_df[["conversations"]])
+dataset = concatenate_datasets([gpt4_dataset, boros_dataset]).shuffle()
+dataset.select_columns(["conversations"]).to_json("/workspace/airoboros-3.2_plus_openchat_sharegpt4.json")
+```
+</details>
+## Training
+The Jamba-v0.1 base model was trained for roughly 3 hours in a A100 (80GB) x 4 environment on the Azure cloud (Standard_NC96ads_A100_v4).
+Our training harness was Axolotl, with the following config as our training parameters:
+<details>
+  <summary>Training config</summary>
+```python
+base_model: ai21labs/Jamba-v0.1
+trust_remote_code: true
+load_in_8bit: false
+load_in_4bit: true
+strict: false
+datasets:
+  - path: /workspace/airoboros-3.2_plus_openchat_sharegpt4.json
+    ds_type: json
+    type: sharegpt
+    conversation: chatml
+dataset_prepared_path:
+val_set_size: 0.01
+output_dir: ./airoboros-3.2_plus_openchat_sharegpt4_one_epoch
+sequence_len: 6000
+sample_packing: true
+pad_to_sequence_len: false
+eval_sample_packing: true
+use_wandb: true
+wandb_project: axolotl
+wandb_entity: peterd
+wandb_name: airoboros-3.2_plus_openchat_sharegpt4
+adapter: qlora
+lora_r: 8
+lora_alpha: 16
+lora_dropout: 0.05
+lora_target_linear: true
+low_cpu_mem_usage: true
+gradient_accumulation_steps: 4
+micro_batch_size: 1
+num_epochs: 1
+optimizer: paged_adamw_8bit
+lr_scheduler: cosine
+learning_rate: 0.0002
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+early_stopping_patience:
+resume_from_checkpoint:
+local_rank:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+warmup_steps: 10
+evals_per_epoch: 5
+saves_per_epoch: 5
+debug:
+deepspeed: /workspace/axolotl/deepspeed_configs/zero2.json
+weight_decay: 0.0
+special_tokens:
+```
+</details>
+<details>
+  <summary>Training graphs</summary>
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/umxTIsNRHUtKS_kL81Uyf.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/mpuCoL99rxX8RCgXH1CJo.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/5FvwYNdte-bgzEvcvFO8I.png)
+</details>
+<br/>
+# Developers
+Lead developer - Peter Devine
+Administrative supervisor - Shunichi Taniguchi