---
library_name: transformers
license: apache-2.0
datasets:
- isek-ai/danbooru-tags-2023
base_model: p1atdev/dart-v1-base
tags:
- trl
- sft
- danbooru
inference: false
---
# Dart (Danbooru Tags Transformer) v1
This model is a fine-tuned Dart (**Da**nboo**r**u **T**ags Transformer) model that generates danbooru tags.
Demo: [🤗 Space](https://huggingface.co./spaces/p1atdev/danbooru-tags-transformer)
If you are a developer and want to finetune, it's recommended using the base version, [p1atdev/dart-v1-base](https://huggingface.co./p1atdev/dart-v1-base), instead
## Usage
### Using AutoModel
🤗 Transformers library is required.
```bash
pip install -U transformers
```
```py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
MODEL_NAME = "p1atdev/dart-v1-sft"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True) # trust_remote_code is required for tokenizer
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16)
prompt = "<|bos|>rating:sfw, rating:generaloriginal1girl, "
inputs = tokenizer(prompt, return_tensors="pt").input_ids
with torch.no_grad():
outputs = model.generate(inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# rating:sfw, rating:general, original, 1girl, ahoge, black hair, blue eyes, blush, closed mouth, ear piercing, earrings, jewelry, looking at viewer, mole, mole under eye, piercing, portrait, shirt, short hair, solo, white shirt
```
#### Flash attention (optional)
Using flash attention can optimize computations, but it is currently only compatible with Linux.
```bash
pip install flash_attn
```
### Accelerate with ORTModel
🤗 Optimum library is also compatible, for the high performance inference using ONNX.
```bash
pip install "optimum[onnxruntime]"
```
Two ONNX models are provided:
- [Normal](./model.onnx)
- [Quantized](./model_quantized.onnx)
Both can be utilized based on the following code:
```py
import torch
from transformers import AutoTokenizer, GenerationConfig
from optimum.onnxruntime import ORTModelForCausalLM
MODEL_NAME = "p1atdev/dart-v1-sft"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
# normal version
ort_model = ORTModelForCausalLM.from_pretrained(MODEL_NAME)
# qunatized version
# ort_model = ORTModelForCausalLM.from_pretrained(MODEL_NAME, file_name="model_quantized.onnx")
prompt = "<|bos|>rating:sfw, rating:generaloriginal1girl, "
inputs = tokenizer(prompt, return_tensors="pt").input_ids
with torch.no_grad():
outputs = model.generate(inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Prompt guidde
Due to training with a specialized prompt format, **natural language is not supported**.
The trained sentences are essentially composed of the following elements, arranged in the strict order shown below:
- `<|bos|>`: The bos (begin of sentence) token
- `[RATING_PARENT], [RATING_CHILD]`: The block of rating tags
- [RATING_PARENT]: `rating:sfw`, `rating:nsfw`
- [RATING_CHILD]:
- if `[RATING_PARENT]` is `rating:sfw`: `rating:general`, `rating:sensitive`
- else: `rating:questionable`, `rating:explicit`
- `[COPYRIGHT, ...]`: The block of copyright tags.
- [COPYRIGHT, ...]: All supported copyright tags can be seen in [TODO]()
- `[CHARACTER, ...]`: The block of character tags.
- [CHARACTER, ...]: All supported character tags can be seen in [TODO]()
- `[LENGTH_TOKEN][GENERAL, ...]<|input_end|>[COMPLETION]`: The block of general tags.
- [LENGTH_TOKEN]: A token to specify **total** amount of general tags.
- Avaiable:
- `<|very_short|>`: less than 10 tags
- `<|short|>`: less than 20 tags
- `<|long|>`: less than 40 tags (recommended)
- `<|very_long|>`: more than 40 tags
- [GENERAL, ...]: All supported general tags can be seen in [TODO]()
- `<|input_end|>`: A tag to show the end of input. Set this token at last of prompt.
- [COMPLETION]: The model complete tags in alphabetical order.
- `<|eos|>`: The eos (end of sentence) token
- Tags other than special tokens are separated by commas.
- You can place tags in any order you like in each block.
Example sentence:
```
<|bos|>rating:sfw, rating:generalvocaloidhatsune miku<|long|>solo, 1girl, very long hair<|input_end|>blue hair, cowboy shot, ...<|eos|>
```
Therefore, to complete the tags, the input prompt should be as follows:
1. without any copyright and character tags
```
<|bos|>rating:sfw, rating:general<|very_long|>1girl, solo, cat ears<|input_end|>
```
2. specifing copyright and character tags
```
<|bos|>rating:sfw, rating:generalsousou no frierenfrieren<|long|>1girl, solo, from side<|input_end|>
```
## Model Details
### Model Description
- **Developed by:** Plat
- **Model type:** Causal language model
- **Language(s) (NLP):** Danbooru tags
- **License:** Apache-2.0
- **Demo:** Avaiable on [🤗Space](https://huggingface.co./spaces/p1atdev/danbooru-tags-transformer)
## Bias, Risks, and Limitations
Since this model is a pre-trained model, it cannot accommodate flexible specifications.
## Training Details
### Training Data
This model was trained with:
- [isek-ai/danbooru-tags-2023](https://huggingface.co./datasets/isek-ai/danbooru-tags-2023): 6M size of danbooru tags dataset since 2005 to 2023
Only data from 2020 onwards was used for SFT.
### Training Procedure
Trained using 🤗 transformers' trainer.
#### Preprocessing
Preprocessing was conducted through the following process:
1. Remove data where `general` tags is null.
2. Remove `general` tags that appear less than 100 times.
3. Remove undesirable tags such as `watermark` and `bad anatomy`.
4. Remove based on the number of tags attached to a single post (following rules):
- Remove if more than 100 for `general` tags.
- Remove if more than 5 for `copyright` tags.
- Remove if more than 10 for `character` tags.
5. Remove posts created before 2020
6. Set length token according to each tags length
7. Shuffle some tags in the following rule:
- Include people tags (e.g. `1girl`, `no humans`) tags in the shuffle-group with a 95% probability, and do not do so with a 5% probability.
- Get tags at a random percentage between 0% and 75% to create a shuffle-group.
- Shuffle tags in shuffle-group and concatnate with `<|input_end|>` token and remains in alphabetical order.
8. Concatnate all categories
#### Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 1
## Evaluation
Evaluation has not been done yet and it needs to evaluate.
## Technical Specifications
### Model Architecture and Objective
The architecture of this model is [OPT (Open Pretrained Transformer)](https://huggingface.co./docs/transformers/model_doc/opt), but the position embeddings was not trained.
### Compute Infrastructure
In house
#### Hardware
1x RTX 3070 Ti
#### Software
- Dataset processing: [🤗 Datasets](https://github.com/huggingface/datasets)
- Training: [🤗 Transformers](https://github.com/huggingface/transformers)
- Optimizing: [🤗 Optimum](https://github.com/huggingface/optimum)
- SFT: [🤗 TRL](https://github.com/huggingface/trl)
## More Information [optional]
[More Information Needed]