p1atdev
/

dart-v1-sft

@@ -1,201 +1,238 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
 ### Model Architecture and Objective
-[More Information Needed]
 ### Compute Infrastructure
-[More Information Needed]
 #### Hardware
-[More Information Needed]
 #### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
 ## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+license: apache-2.0
+datasets:
+- isek-ai/danbooru-tags-2023
+base_model: p1atdev/dart-v1-base
+tags:
+- trl
+- sft
+- danbooru
 ---
+# Dart (Danbooru Tags Transformer) v1
+This model is a fine-tuned Dart (**Da**nboo**r**u **T**ags Transformer) model that generates danbooru tags.
+Demo: [🤗 Space](https://huggingface.co/spaces/p1atdev/danbooru-tags-transformer)
+If you are a developer and want to finetune, it's recommended using the base version, [p1atdev/dart-v1-base](https://huggingface.co/p1atdev/dart-v1-base), instead
+## Usage
+### Using AutoModel
+🤗 Transformers library is required.
+```bash
+pip install -U transformers
+```
+```py
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
+MODEL_NAME = "p1atdev/dart-v1-sft"
+tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True) # trust_remote_code is required for tokenizer
+model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16)
+prompt = "<|bos|><rating>rating:sfw, rating:general</rating><copyright>original</copyright><character></character><general>1girl, "
+inputs = tokenizer(prompt, return_tensors="pt").input_ids
+with torch.no_grad():
+  outputs = model.generate(inputs, generation_config=generation_config)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+# rating:sfw, rating:general, original, 1girl, ahoge, black hair, blue eyes, blush, closed mouth, ear piercing, earrings, jewelry, looking at viewer, mole, mole under eye, piercing, portrait, shirt, short hair, solo, white shirt
+```
+#### Flash attention (optional)
+Using flash attention can optimize computations, but it is currently only compatible with Linux.
+```bash
+pip install flash_attn
+```
+### Accelerate with ORTModel
+🤗 Optimum library is also compatible, for the high performance inference using ONNX.
+```bash
+pip install "optimum[onnxruntime]"
+```
+Two ONNX models are provided:
+- [Normal](./model.onnx)
+- [Quantized](./model_quantized.onnx)
+Both can be utilized based on the following code:
+```py
+import torch
+from transformers import AutoTokenizer, GenerationConfig
+from optimum.onnxruntime import ORTModelForCausalLM
+MODEL_NAME = "p1atdev/dart-v1-sft"
+tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
+# normal version
+ort_model = ORTModelForCausalLM.from_pretrained(MODEL_NAME)
+# qunatized version
+# ort_model = ORTModelForCausalLM.from_pretrained(MODEL_NAME, file_name="model_quantized.onnx")
+prompt = "<|bos|><rating>rating:sfw, rating:general</rating><copyright>original</copyright><character></character><general>1girl, "
+inputs = tokenizer(prompt, return_tensors="pt").input_ids
+with torch.no_grad():
+  outputs = model.generate(inputs, generation_config=generation_config)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### Prompt guidde
+Due to training with a specialized prompt format, **natural language is not supported**.
+The trained sentences are essentially composed of the following elements, arranged in the strict order shown below:
+- `<|bos|>`: The bos (begin of sentence) token
+- `<rating>[RATING_PARENT], [RATING_CHILD]</rating>`: The block of rating tags
+  - [RATING_PARENT]: `rating:sfw`, `rating:nsfw`
+  - [RATING_CHILD]:
+    - if `[RATING_PARENT]` is `rating:sfw`: `rating:general`, `rating:sensitive`
+    - else: `rating:questionable`, `rating:explicit`
+- `<copyright>[COPYRIGHT, ...]</copyright>`: The block of copyright tags.
+  - [COPYRIGHT, ...]: All supported copyright tags can be seen in [TODO]()
+- `<character>[CHARACTER, ...]</character>`: The block of character tags.
+  - [CHARACTER, ...]: All supported character tags can be seen in [TODO]()
+- `<general>[LENGTH_TOKEN][GENERAL, ...]<|input_end|>[COMPLETION]</general>`: The block of general tags.
+  - [LENGTH_TOKEN]: A token to specify **total** amount of general tags.
+    - Avaiable:
+      - `<|very_short|>`: less than 10 tags
+      - `<|short|>`: less than 20 tags
+      - `<|long|>`: less than 40 tags (recommended)
+      - `<|very_long|>`: more than 40 tags
+  - [GENERAL, ...]:  All supported general tags can be seen in [TODO]()
+  - `<|input_end|>`: A tag to show the end of input. Set this token at last of prompt.
+  - [COMPLETION]: The model complete tags in alphabetical order.
+- `<|eos|>`: The eos (end of sentence) token
+- Tags other than special tokens are separated by commas.
+- You can place tags in any order you like in each block.
+Example sentence:
+```
+<|bos|><rating>rating:sfw, rating:general</rating><copyright>vocaloid</copyright><character>hatsune miku</character><general><|long|>solo, 1girl, very long hair<|input_end|>blue hair, cowboy shot, ...</general><|eos|>
+```
+Therefore, to complete the tags, the input prompt should be as follows:
+1. without any copyright and character tags
+```
+<|bos|><rating>rating:sfw, rating:general</rating><copyright></copyright><character></character><general><|very_long|>1girl, solo, cat ears<|input_end|>
+```
+2. specifing copyright and character tags
+```
+<|bos|><rating>rating:sfw, rating:general</rating><copyright>sousou no frieren</copyright><character>frieren</character><general><|long|>1girl, solo, from side<|input_end|>
+```
+## Model Details
+### Model Description
+- **Developed by:** Plat
+- **Model type:** Causal language model
+- **Language(s) (NLP):** Danbooru tags
+- **License:** Apache-2.0
+- **Demo:** Avaiable on [🤗Space](https://huggingface.co/spaces/p1atdev/danbooru-tags-transformer)
+## Bias, Risks, and Limitations
+Since this model is a pre-trained model, it cannot accommodate flexible specifications.
+## Training Details
+### Training Data
+This model was trained with:
+- [isek-ai/danbooru-tags-2023](https://huggingface.co/datasets/isek-ai/danbooru-tags-2023): 6M size of danbooru tags dataset since 2005 to 2023
+Only data from 2020 onwards was used for SFT.
+### Training Procedure
+Trained using 🤗 transformers' trainer.
+#### Preprocessing
+Preprocessing was conducted through the following process:
+1. Remove data where `general` tags is null.
+2. Remove `general` tags that appear less than 100 times.
+3. Remove undesirable tags such as `watermark` and `bad anatomy`.
+4. Remove based on the number of tags attached to a single post (following rules):
+  - Remove if more than 100 for `general` tags.
+  - Remove if more than 5 for `copyright` tags.
+  - Remove if more than 10 for `character` tags.
+5. Remove posts created before 2020
+6. Set length token according to each tags length
+7. Shuffle some tags in the following rule:
+  - Include people tags (e.g. `1girl`, `no humans`) tags in the shuffle-group with a 95% probability, and do not do so with a 5% probability.
+  - Get tags at a random percentage between 0% and 75% to create a shuffle-group.
+  - Shuffle tags in shuffle-group and concatnate with `<|input_end|>` token and remains in alphabetical order.
+8. Concatnate all categories
+#### Training Hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 32
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 64
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 500
+- num_epochs: 1
+## Evaluation
+Evaluation has not been done yet and it needs to evaluate.
+## Technical Specifications
 ### Model Architecture and Objective
+The architecture of this model is [OPT (Open Pretrained Transformer)](https://huggingface.co/docs/transformers/model_doc/opt), but the position embeddings was not trained.
 ### Compute Infrastructure
+In house
 #### Hardware
+1x RTX 3070 Ti
 #### Software
+- Dataset processing: [🤗 Datasets](https://github.com/huggingface/datasets)
+- Training: [🤗 Transformers](https://github.com/huggingface/transformers)
+- Optimizing: [🤗 Optimum](https://github.com/huggingface/optimum)
+- SFT: [🤗 TRL](https://github.com/huggingface/trl)
 ## More Information [optional]
+[More Information Needed]