wjbmattingly
commited on
Commit
β’
e653cce
1
Parent(s):
152f24a
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,116 @@
|
|
1 |
-
---
|
2 |
-
license: cc-by-4.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-4.0
|
3 |
+
datasets:
|
4 |
+
- CATMuS/medieval-segmentation
|
5 |
+
pipeline_tag: object-detection
|
6 |
+
tags:
|
7 |
+
- medieval
|
8 |
+
- manuscript
|
9 |
+
---
|
10 |
+
|
11 |
+
# Florence 2 Medieval Zone Object Detection
|
12 |
+
|
13 |
+
This is Microsoft's Florence 2 model trained for 10 epochs with [CATMuS Medieval Segmentation dataset](https://huggingface.co/datasets/CATMuS/medieval-segmentation) with a learn rate of `1e-6`. The code for fine-tuning can be found [here](). The blog for this process can be found [here](). This model would not be possible without the numerous annotators behind the various datasets available on HTR-United (See dataset for details). A special thanks to [Thibault ClΓ©rice](https://huggingface.co/ponteineptique) who converted the original CATMuS dataset (for HTR) to a segmentation dataset.
|
14 |
+
|
15 |
+
# Model Details
|
16 |
+
|
17 |
+
- **Developed by**: [William J.B. Mattingly](https://huggingface.co/wjbmattingly)
|
18 |
+
- **License**: CC-BY 4.0
|
19 |
+
- **Finetuned from model**: [Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft)
|
20 |
+
|
21 |
+
## Labels
|
22 |
+
|
23 |
+
The following table describes the labels, the ones used to train this model, the counts of those labels (multiples per image), and the definition of those labels with a link to the original documentation.
|
24 |
+
|
25 |
+
| Label | Zone | Line | Train Count | Validation Count | Test Count | Definition |
|
26 |
+
|-------|------|------|-------------|------------------|------------|------------|
|
27 |
+
| DefaultLine | | β | 81702 | 13554 | 12209 | [A line of text that is not distinguished by any particular features and is part of the main text flow.](https://segmonto.github.io/gd/gdL/DefaultLine/) |
|
28 |
+
| InterlinearLine | | β | 2808 | 27 | 2234 | [A line of text written between two lines of main text, typically containing glosses, translations, or comments.](https://segmonto.github.io/gd/gdL/InterlinearLine/) |
|
29 |
+
| MainZone | β | | 2314 | 365 | 275 | [The main textual zone of a page, usually containing the main body of text.](https://segmonto.github.io/gd/gdZ/MainZone/) |
|
30 |
+
| HeadingLine | | β | 1381 | 701 | 135 | [A line of text that functions as a heading or title for a section of the main text.](https://segmonto.github.io/gd/gdL/HeadingLine/) |
|
31 |
+
| MarginTextZone | β | | 916 | 146 | 199 | [A text zone in the margin of a page, often containing annotations, commentaries, or other secondary information.](https://segmonto.github.io/gd/gdZ/MarginTextZone/) |
|
32 |
+
| DropCapitalZone | β | | 1566 | 102 | 124 | [A zone containing a large ornamental initial letter of a paragraph or section, typically extending below the first line of text.](https://segmonto.github.io/gd/gdZ/DropCapitalZone/) |
|
33 |
+
| NumberingZone | β | | 632 | 102 | 94 | [A zone containing page numbers, folio numbers, or other numerical identifiers for the page.](https://segmonto.github.io/gd/gdZ/NumberingZone/) |
|
34 |
+
| TironianSignLine | | | 282 | 0 | 0 | [A line containing Tironian notes, an ancient system of shorthand.](https://segmonto.github.io/gd/gdL/TironianSignLine/) |
|
35 |
+
| DropCapitalLine | | | 1175 | 105 | 92 | [A line of text that begins with a drop capital.](https://segmonto.github.io/gd/gdL/DropCapitalLine/) |
|
36 |
+
| RunningTitleZone | β | | 340 | 91 | 18 | [A zone containing a running title, typically located at the top of a page and repeating throughout a section or the entire document.](https://segmonto.github.io/gd/gdZ/RunningTitleZone/) |
|
37 |
+
| GraphicZone | β | | 300 | 7 | 10 | [A zone containing non-textual elements such as images, drawings, or decorative elements.](https://segmonto.github.io/gd/gdZ/GraphicZone/) |
|
38 |
+
| DigitizationArtefactZone | | | 28 | 0 | 0 | [A zone containing artefacts from the digitization process, such as color bars or reference marks.](https://segmonto.github.io/gd/gdZ/DigitizationArtefactZone/) |
|
39 |
+
| QuireMarksZone | β | | 86 | 9 | 8 | [A zone containing marks used to indicate the gathering or quire to which a leaf belongs, often found at the bottom of the page.](https://segmonto.github.io/gd/gdZ/QuireMarksZone/) |
|
40 |
+
| StampZone | β | | 39 | 5 | 4 | [A zone containing a stamp, such as a library stamp or ownership mark.](https://segmonto.github.io/gd/gdZ/StampZone/) |
|
41 |
+
| DamageZone | β | | 12 | 1 | 0 | [A zone indicating an area of the page that has been damaged or is otherwise illegible due to physical deterioration.](https://segmonto.github.io/gd/gdZ/DamageZone/) |
|
42 |
+
| MusicZone | β | | 179 | 0 | 0 | [A zone containing musical notation.](https://segmonto.github.io/gd/gdZ/MusicZone/) |
|
43 |
+
| MusicLine | | | 167 | 0 | 0 | [A line containing musical notation.](https://segmonto.github.io/gd/gdL/MusicLine/) |
|
44 |
+
| TitlePageZone | β | | 4 | 1 | 1 | [A zone encompassing the entire title page of a book or document.](https://segmonto.github.io/gd/gdZ/TitlePageZone/) |
|
45 |
+
| SealZone | β | | 3 | 0 | 0 | [A zone containing a seal, typically used for authentication or closure of a document.](https://segmonto.github.io/gd/gdZ/SealZone/) |
|
46 |
+
|
47 |
+
|
48 |
+
# How to Get Started with the Model
|
49 |
+
|
50 |
+
Use the code below to get started with the model. All models are trained with float16.
|
51 |
+
|
52 |
+
```
|
53 |
+
import requests
|
54 |
+
from PIL import Image
|
55 |
+
from transformers import AutoProcessor, AutoModelForCausalLM
|
56 |
+
import os
|
57 |
+
from unittest.mock import patch
|
58 |
+
|
59 |
+
import requests
|
60 |
+
from PIL import Image
|
61 |
+
from transformers import AutoModelForCausalLM, AutoProcessor
|
62 |
+
from transformers.dynamic_module_utils import get_imports
|
63 |
+
import matplotlib.pyplot as plt
|
64 |
+
import matplotlib.patches as patches
|
65 |
+
|
66 |
+
# Mac solution => https://huggingface.co/microsoft/Florence-2-large-ft/discussions/4
|
67 |
+
def fixed_get_imports(filename: str | os.PathLike) -> list[str]:
|
68 |
+
"""Work around for https://huggingface.co/microsoft/phi-1_5/discussions/72."""
|
69 |
+
if not str(filename).endswith("/modeling_florence2.py"):
|
70 |
+
return get_imports(filename)
|
71 |
+
imports = get_imports(filename)
|
72 |
+
imports.remove("flash_attn")
|
73 |
+
return imports
|
74 |
+
|
75 |
+
|
76 |
+
with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports):
|
77 |
+
|
78 |
+
model = AutoModelForCausalLM.from_pretrained("medieval-data/florence2-medieval-bbox-zone-detection", trust_remote_code=True)
|
79 |
+
processor = AutoProcessor.from_pretrained("medieval-data/florence2-medieval-bbox-zone-detection", trust_remote_code=True)
|
80 |
+
|
81 |
+
def process_image(url):
|
82 |
+
prompt = "<OD>"
|
83 |
+
|
84 |
+
image = Image.open(requests.get(url, stream=True).raw)
|
85 |
+
|
86 |
+
inputs = processor(text=prompt, images=image, return_tensors="pt")
|
87 |
+
|
88 |
+
generated_ids = model.generate(
|
89 |
+
input_ids=inputs["input_ids"],
|
90 |
+
pixel_values=inputs["pixel_values"],
|
91 |
+
max_new_tokens=1024,
|
92 |
+
do_sample=False,
|
93 |
+
num_beams=3
|
94 |
+
)
|
95 |
+
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
|
96 |
+
|
97 |
+
result = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height))
|
98 |
+
return result, image
|
99 |
+
|
100 |
+
|
101 |
+
image = "https://huggingface.co/datasets/CATMuS/medieval-segmentation/resolve/main/data/train/cambridge-corpus-christi-college-ms-111/page-002-of-003.jpg"
|
102 |
+
|
103 |
+
result, image = process_image(image)
|
104 |
+
fig, ax = plt.subplots(1, figsize=(15, 15))
|
105 |
+
ax.imshow(image)
|
106 |
+
|
107 |
+
# Add bounding boxes and labels to the plot
|
108 |
+
for bbox, label in zip(result['<OD>']['bboxes'], result['<OD>']['labels']):
|
109 |
+
x, y, width, height = bbox[0], bbox[1], bbox[2] - bbox[0], bbox[3] - bbox[1]
|
110 |
+
rect = patches.Rectangle((x, y), width, height, linewidth=2, edgecolor='r', facecolor='none')
|
111 |
+
ax.add_patch(rect)
|
112 |
+
plt.text(x, y, label, fontsize=12, bbox=dict(facecolor='yellow', alpha=0.5))
|
113 |
+
|
114 |
+
# Display the plot
|
115 |
+
plt.show()
|
116 |
+
```
|