wjbmattingly commited on
Commit
e653cce
β€’
1 Parent(s): 152f24a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -3
README.md CHANGED
@@ -1,3 +1,116 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ datasets:
4
+ - CATMuS/medieval-segmentation
5
+ pipeline_tag: object-detection
6
+ tags:
7
+ - medieval
8
+ - manuscript
9
+ ---
10
+
11
+ # Florence 2 Medieval Zone Object Detection
12
+
13
+ This is Microsoft's Florence 2 model trained for 10 epochs with [CATMuS Medieval Segmentation dataset](https://huggingface.co/datasets/CATMuS/medieval-segmentation) with a learn rate of `1e-6`. The code for fine-tuning can be found [here](). The blog for this process can be found [here](). This model would not be possible without the numerous annotators behind the various datasets available on HTR-United (See dataset for details). A special thanks to [Thibault ClΓ©rice](https://huggingface.co/ponteineptique) who converted the original CATMuS dataset (for HTR) to a segmentation dataset.
14
+
15
+ # Model Details
16
+
17
+ - **Developed by**: [William J.B. Mattingly](https://huggingface.co/wjbmattingly)
18
+ - **License**: CC-BY 4.0
19
+ - **Finetuned from model**: [Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft)
20
+
21
+ ## Labels
22
+
23
+ The following table describes the labels, the ones used to train this model, the counts of those labels (multiples per image), and the definition of those labels with a link to the original documentation.
24
+
25
+ | Label | Zone | Line | Train Count | Validation Count | Test Count | Definition |
26
+ |-------|------|------|-------------|------------------|------------|------------|
27
+ | DefaultLine | | βœ“ | 81702 | 13554 | 12209 | [A line of text that is not distinguished by any particular features and is part of the main text flow.](https://segmonto.github.io/gd/gdL/DefaultLine/) |
28
+ | InterlinearLine | | βœ“ | 2808 | 27 | 2234 | [A line of text written between two lines of main text, typically containing glosses, translations, or comments.](https://segmonto.github.io/gd/gdL/InterlinearLine/) |
29
+ | MainZone | βœ“ | | 2314 | 365 | 275 | [The main textual zone of a page, usually containing the main body of text.](https://segmonto.github.io/gd/gdZ/MainZone/) |
30
+ | HeadingLine | | βœ“ | 1381 | 701 | 135 | [A line of text that functions as a heading or title for a section of the main text.](https://segmonto.github.io/gd/gdL/HeadingLine/) |
31
+ | MarginTextZone | βœ“ | | 916 | 146 | 199 | [A text zone in the margin of a page, often containing annotations, commentaries, or other secondary information.](https://segmonto.github.io/gd/gdZ/MarginTextZone/) |
32
+ | DropCapitalZone | βœ“ | | 1566 | 102 | 124 | [A zone containing a large ornamental initial letter of a paragraph or section, typically extending below the first line of text.](https://segmonto.github.io/gd/gdZ/DropCapitalZone/) |
33
+ | NumberingZone | βœ“ | | 632 | 102 | 94 | [A zone containing page numbers, folio numbers, or other numerical identifiers for the page.](https://segmonto.github.io/gd/gdZ/NumberingZone/) |
34
+ | TironianSignLine | | | 282 | 0 | 0 | [A line containing Tironian notes, an ancient system of shorthand.](https://segmonto.github.io/gd/gdL/TironianSignLine/) |
35
+ | DropCapitalLine | | | 1175 | 105 | 92 | [A line of text that begins with a drop capital.](https://segmonto.github.io/gd/gdL/DropCapitalLine/) |
36
+ | RunningTitleZone | βœ“ | | 340 | 91 | 18 | [A zone containing a running title, typically located at the top of a page and repeating throughout a section or the entire document.](https://segmonto.github.io/gd/gdZ/RunningTitleZone/) |
37
+ | GraphicZone | βœ“ | | 300 | 7 | 10 | [A zone containing non-textual elements such as images, drawings, or decorative elements.](https://segmonto.github.io/gd/gdZ/GraphicZone/) |
38
+ | DigitizationArtefactZone | | | 28 | 0 | 0 | [A zone containing artefacts from the digitization process, such as color bars or reference marks.](https://segmonto.github.io/gd/gdZ/DigitizationArtefactZone/) |
39
+ | QuireMarksZone | βœ“ | | 86 | 9 | 8 | [A zone containing marks used to indicate the gathering or quire to which a leaf belongs, often found at the bottom of the page.](https://segmonto.github.io/gd/gdZ/QuireMarksZone/) |
40
+ | StampZone | βœ“ | | 39 | 5 | 4 | [A zone containing a stamp, such as a library stamp or ownership mark.](https://segmonto.github.io/gd/gdZ/StampZone/) |
41
+ | DamageZone | βœ“ | | 12 | 1 | 0 | [A zone indicating an area of the page that has been damaged or is otherwise illegible due to physical deterioration.](https://segmonto.github.io/gd/gdZ/DamageZone/) |
42
+ | MusicZone | βœ“ | | 179 | 0 | 0 | [A zone containing musical notation.](https://segmonto.github.io/gd/gdZ/MusicZone/) |
43
+ | MusicLine | | | 167 | 0 | 0 | [A line containing musical notation.](https://segmonto.github.io/gd/gdL/MusicLine/) |
44
+ | TitlePageZone | βœ“ | | 4 | 1 | 1 | [A zone encompassing the entire title page of a book or document.](https://segmonto.github.io/gd/gdZ/TitlePageZone/) |
45
+ | SealZone | βœ“ | | 3 | 0 | 0 | [A zone containing a seal, typically used for authentication or closure of a document.](https://segmonto.github.io/gd/gdZ/SealZone/) |
46
+
47
+
48
+ # How to Get Started with the Model
49
+
50
+ Use the code below to get started with the model. All models are trained with float16.
51
+
52
+ ```
53
+ import requests
54
+ from PIL import Image
55
+ from transformers import AutoProcessor, AutoModelForCausalLM
56
+ import os
57
+ from unittest.mock import patch
58
+
59
+ import requests
60
+ from PIL import Image
61
+ from transformers import AutoModelForCausalLM, AutoProcessor
62
+ from transformers.dynamic_module_utils import get_imports
63
+ import matplotlib.pyplot as plt
64
+ import matplotlib.patches as patches
65
+
66
+ # Mac solution => https://huggingface.co/microsoft/Florence-2-large-ft/discussions/4
67
+ def fixed_get_imports(filename: str | os.PathLike) -> list[str]:
68
+ """Work around for https://huggingface.co/microsoft/phi-1_5/discussions/72."""
69
+ if not str(filename).endswith("/modeling_florence2.py"):
70
+ return get_imports(filename)
71
+ imports = get_imports(filename)
72
+ imports.remove("flash_attn")
73
+ return imports
74
+
75
+
76
+ with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports):
77
+
78
+ model = AutoModelForCausalLM.from_pretrained("medieval-data/florence2-medieval-bbox-zone-detection", trust_remote_code=True)
79
+ processor = AutoProcessor.from_pretrained("medieval-data/florence2-medieval-bbox-zone-detection", trust_remote_code=True)
80
+
81
+ def process_image(url):
82
+ prompt = "<OD>"
83
+
84
+ image = Image.open(requests.get(url, stream=True).raw)
85
+
86
+ inputs = processor(text=prompt, images=image, return_tensors="pt")
87
+
88
+ generated_ids = model.generate(
89
+ input_ids=inputs["input_ids"],
90
+ pixel_values=inputs["pixel_values"],
91
+ max_new_tokens=1024,
92
+ do_sample=False,
93
+ num_beams=3
94
+ )
95
+ generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
96
+
97
+ result = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height))
98
+ return result, image
99
+
100
+
101
+ image = "https://huggingface.co/datasets/CATMuS/medieval-segmentation/resolve/main/data/train/cambridge-corpus-christi-college-ms-111/page-002-of-003.jpg"
102
+
103
+ result, image = process_image(image)
104
+ fig, ax = plt.subplots(1, figsize=(15, 15))
105
+ ax.imshow(image)
106
+
107
+ # Add bounding boxes and labels to the plot
108
+ for bbox, label in zip(result['<OD>']['bboxes'], result['<OD>']['labels']):
109
+ x, y, width, height = bbox[0], bbox[1], bbox[2] - bbox[0], bbox[3] - bbox[1]
110
+ rect = patches.Rectangle((x, y), width, height, linewidth=2, edgecolor='r', facecolor='none')
111
+ ax.add_patch(rect)
112
+ plt.text(x, y, label, fontsize=12, bbox=dict(facecolor='yellow', alpha=0.5))
113
+
114
+ # Display the plot
115
+ plt.show()
116
+ ```