File size: 1,832 Bytes
5f6694f
 
 
 
 
 
 
 
 
 
01a8bc0
5f6694f
01a8bc0
5f6694f
01a8bc0
 
 
27e05bb
 
01a8bc0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
license: unlicense
datasets:
- poloclub/diffusiondb
language:
- en
metrics:
- wer
pipeline_tag: image-to-text
---
# Untitled7-colab_checkpoint

This model was lovingly named after the Google Colab notebook that made it. It is a finetune of Microsoft's [git-large-coco](https://huggingface.co./microsoft/git-large-coco) model on the 1k subset of [poloclub/diffusiondb](https://huggingface.co./datasets/poloclub/diffusiondb/viewer/2m_first_1k/train).

It is supposed to read images and extract a stable diffusion prompt from it but, it might not do a good job at it. I wouldn't know I haven't extensivly tested it.

As the title suggests this is a checkpoint as I formerly intended to do it on the entire dataset but, I'm unsure if I want to now...

This is my first public model so please be nice!
## Intended use

Fun!

```python
# Load model directly
from transformers import AutoProcessor, AutoModelForCausalLM

processor = AutoProcessor.from_pretrained("SE6446/Untitled7-colab_checkpoint")
model = AutoModelForCausalLM.from_pretrained("SE6446/Untitled7-colab_checkpoint")

#################################################################
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-to-text", model="SE6446/Untitled7-colab_checkpoint")
```

## Out-of-scope use

Don't use this model to discriminate, alienate or in any other way harm/harass individuals. You guys know the drill...

## Bias, Risks and, Limitations

This model does not produce accurate prompts, this is merely a bit of fun (and waste of funds). However it can suffer from bias present in the orginal git-large-coco model.

## Training
*I.e boring stuff*

- lr = 5e-5
- epochs = 150
- optim = adamw
- fp16

If you want to further finetune it then you should freeze the embedding and vision tranformer layers