File size: 2,419 Bytes
2d5e6a5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
license: mit
---
# Florence-2-base-PromptGen

Florence-2-base-PromptGen is a model trained for [MiaoshouAI Tagger for ComfyUI](https://github.com/miaoshouai/ComfyUI-Miaoshouai-Tagger).
It is an advanced image captioning tool based on the [Microsoft Florence-2 Model](https://huggingface.co./microsoft/Florence-2-base) and fine-tuned to perfection.

## Why another tagging model?
Most vision models today are trained mainly for general vision recognition purposes, but when doing prompting and image tagging for model training, the format and details of the captions is quite different.
 
Florence-2-base-PromptGen is trained on such a purpose as aiming to improve the tagging experience and accuracy of the prompt and tagging job. The model is trained based on images and cleaned tags from Civitai so that the end result for tagging the images are the prompts you use to generate these images.

## Instruction prompt:
A new instruction prompt \<GENERATE_PROMPT\> is created for this purpose in addition to \<DETAILED_CAPTION\> and \<MORE_DETAILED_CAPTION\>.
It will respond back in danbooru tagging style with much better accuracy and proper level of details.

## How to use:

To use this model, you can load it directly from the Hugging Face Model Hub:

```python

model = AutoModelForCausalLM.from_pretrained("MiaoshouAI/Florence-2-base-PromptGen", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("MiaoshouAI/Florence-2-base-PromptGen", trust_remote_code=True)

prompt = "<GENERATE_PROMPT>"

url = "https://huggingface.co./datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=prompt, images=image, return_tensors="pt").to(device)

generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    max_new_tokens=1024,
    do_sample=False,
    num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]

parsed_answer = processor.post_process_generation(generated_text, task=prompt, image_size=(image.width, image.height))

print(parsed_answer)
```

## Use under MiaoshouAI Tagger ComfyUI
If you just want to use this model, you can use it under ComfyUI-Miaoshouai-Tagger

https://github.com/miaoshouai/ComfyUI-Miaoshouai-Tagger

A detailed use and install instruction is already there.