medieval-data
/

Florence-2-base-medieval-zone-detection

+---
+license: cc-by-4.0
+datasets:
+- CATMuS/medieval-segmentation
+pipeline_tag: object-detection
+tags:
+- medieval
+- manuscript
+---
+# Florence 2 Medieval Zone Object Detection
+This is Microsoft's Florence 2 model trained for 10 epochs with [CATMuS Medieval Segmentation dataset](https://huggingface.co/datasets/CATMuS/medieval-segmentation) with a learn rate of `1e-6`. The code for fine-tuning can be found [here](). The blog for this process can be found [here](). This model would not be possible without the numerous annotators behind the various datasets available on HTR-United (See dataset for details). A special thanks to [Thibault Clérice](https://huggingface.co/ponteineptique) who converted the original CATMuS dataset (for HTR) to a segmentation dataset.
+# Model Details
+- **Developed by**: [William J.B. Mattingly](https://huggingface.co/wjbmattingly)
+- **License**: CC-BY 4.0
+- **Finetuned from model**: [Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft)
+## Labels
+The following table describes the labels, the ones used to train this model, the counts of those labels (multiples per image), and the definition of those labels with a link to the original documentation.
+| Label | Zone | Line | Train Count | Validation Count | Test Count | Definition |
+|-------|------|------|-------------|------------------|------------|------------|
+| DefaultLine |  | ✓ | 81702 | 13554 | 12209 | [A line of text that is not distinguished by any particular features and is part of the main text flow.](https://segmonto.github.io/gd/gdL/DefaultLine/) |
+| InterlinearLine |  | ✓ | 2808 | 27 | 2234 | [A line of text written between two lines of main text, typically containing glosses, translations, or comments.](https://segmonto.github.io/gd/gdL/InterlinearLine/) |
+| MainZone | ✓ |  | 2314 | 365 | 275 | [The main textual zone of a page, usually containing the main body of text.](https://segmonto.github.io/gd/gdZ/MainZone/) |
+| HeadingLine |  | ✓ | 1381 | 701 | 135 | [A line of text that functions as a heading or title for a section of the main text.](https://segmonto.github.io/gd/gdL/HeadingLine/) |
+| MarginTextZone | ✓ |  | 916 | 146 | 199 | [A text zone in the margin of a page, often containing annotations, commentaries, or other secondary information.](https://segmonto.github.io/gd/gdZ/MarginTextZone/) |
+| DropCapitalZone | ✓ |  | 1566 | 102 | 124 | [A zone containing a large ornamental initial letter of a paragraph or section, typically extending below the first line of text.](https://segmonto.github.io/gd/gdZ/DropCapitalZone/) |
+| NumberingZone | ✓ |  | 632 | 102 | 94 | [A zone containing page numbers, folio numbers, or other numerical identifiers for the page.](https://segmonto.github.io/gd/gdZ/NumberingZone/) |
+| TironianSignLine |  |  | 282 | 0 | 0 | [A line containing Tironian notes, an ancient system of shorthand.](https://segmonto.github.io/gd/gdL/TironianSignLine/) |
+| DropCapitalLine |  |  | 1175 | 105 | 92 | [A line of text that begins with a drop capital.](https://segmonto.github.io/gd/gdL/DropCapitalLine/) |
+| RunningTitleZone | ✓ |  | 340 | 91 | 18 | [A zone containing a running title, typically located at the top of a page and repeating throughout a section or the entire document.](https://segmonto.github.io/gd/gdZ/RunningTitleZone/) |
+| GraphicZone | ✓ |  | 300 | 7 | 10 | [A zone containing non-textual elements such as images, drawings, or decorative elements.](https://segmonto.github.io/gd/gdZ/GraphicZone/) |
+| DigitizationArtefactZone |  |  | 28 | 0 | 0 | [A zone containing artefacts from the digitization process, such as color bars or reference marks.](https://segmonto.github.io/gd/gdZ/DigitizationArtefactZone/) |
+| QuireMarksZone | ✓ |  | 86 | 9 | 8 | [A zone containing marks used to indicate the gathering or quire to which a leaf belongs, often found at the bottom of the page.](https://segmonto.github.io/gd/gdZ/QuireMarksZone/) |
+| StampZone | ✓ |  | 39 | 5 | 4 | [A zone containing a stamp, such as a library stamp or ownership mark.](https://segmonto.github.io/gd/gdZ/StampZone/) |
+| DamageZone | ✓ |  | 12 | 1 | 0 | [A zone indicating an area of the page that has been damaged or is otherwise illegible due to physical deterioration.](https://segmonto.github.io/gd/gdZ/DamageZone/) |
+| MusicZone | ✓ |  | 179 | 0 | 0 | [A zone containing musical notation.](https://segmonto.github.io/gd/gdZ/MusicZone/) |
+| MusicLine |  |  | 167 | 0 | 0 | [A line containing musical notation.](https://segmonto.github.io/gd/gdL/MusicLine/) |
+| TitlePageZone | ✓ |  | 4 | 1 | 1 | [A zone encompassing the entire title page of a book or document.](https://segmonto.github.io/gd/gdZ/TitlePageZone/) |
+| SealZone | ✓ |  | 3 | 0 | 0 | [A zone containing a seal, typically used for authentication or closure of a document.](https://segmonto.github.io/gd/gdZ/SealZone/) |
+# How to Get Started with the Model
+Use the code below to get started with the model. All models are trained with float16.
+```
+import requests
+from PIL import Image
+from transformers import AutoProcessor, AutoModelForCausalLM
+import os
+from unittest.mock import patch
+import requests
+from PIL import Image
+from transformers import AutoModelForCausalLM, AutoProcessor
+from transformers.dynamic_module_utils import get_imports
+import matplotlib.pyplot as plt
+import matplotlib.patches as patches
+# Mac solution => https://huggingface.co/microsoft/Florence-2-large-ft/discussions/4
+def fixed_get_imports(filename: str | os.PathLike) -> list[str]:
+    """Work around for https://huggingface.co/microsoft/phi-1_5/discussions/72."""
+    if not str(filename).endswith("/modeling_florence2.py"):
+        return get_imports(filename)
+    imports = get_imports(filename)
+    imports.remove("flash_attn")
+    return imports
+with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports):
+    model = AutoModelForCausalLM.from_pretrained("medieval-data/florence2-medieval-bbox-zone-detection", trust_remote_code=True)
+    processor = AutoProcessor.from_pretrained("medieval-data/florence2-medieval-bbox-zone-detection", trust_remote_code=True)
+def process_image(url):
+    prompt = "<OD>"
+    image = Image.open(requests.get(url, stream=True).raw)
+    inputs = processor(text=prompt, images=image, return_tensors="pt")
+    generated_ids = model.generate(
+        input_ids=inputs["input_ids"],
+        pixel_values=inputs["pixel_values"],
+        max_new_tokens=1024,
+        do_sample=False,
+        num_beams=3
+    )
+    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
+    result = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height))
+    return result, image
+image = "https://huggingface.co/datasets/CATMuS/medieval-segmentation/resolve/main/data/train/cambridge-corpus-christi-college-ms-111/page-002-of-003.jpg"
+result, image = process_image(image)
+fig, ax = plt.subplots(1, figsize=(15, 15))
+ax.imshow(image)
+# Add bounding boxes and labels to the plot
+for bbox, label in zip(result['<OD>']['bboxes'], result['<OD>']['labels']):
+    x, y, width, height = bbox[0], bbox[1], bbox[2] - bbox[0], bbox[3] - bbox[1]
+    rect = patches.Rectangle((x, y), width, height, linewidth=2, edgecolor='r', facecolor='none')
+    ax.add_patch(rect)
+    plt.text(x, y, label, fontsize=12, bbox=dict(facecolor='yellow', alpha=0.5))
+# Display the plot
+plt.show()
+```