hantian
/

yolo-doclaynet

document-analysis

Model card Files Files and versions Community

hantian commited on May 20, 2024

Commit

dfabea3

·

verified ·

1 Parent(s): b78dd24

Update README.md

Files changed (1) hide show

README.md +52 -2

README.md CHANGED Viewed

@@ -8,6 +8,56 @@ tags:
 - document-analysis
 ---
-yolo-doclaynet
-https://github.com/ppaanngggg/yolo-doclaynet

 - document-analysis
 ---
+**More details refer to [Github](https://github.com/ppaanngggg/yolo-doclaynet)**
+## Introduction
+You know that RAG is very popular these days. There are many applications that support talking to documents. However,
+there is a huge performance drop when talking to a complex document due to the complex structures. So it's a challenge
+to extract content from complex document and organize it into parsable form. This repo aims to solve this challenge with
+a fast and good performance method.
+## Detection Sample
+![image](https://github.com/ppaanngggg/yolo-doclaynet/raw/main/annotated-test.png)
+## Method
+1. `YOLO` is the most advenced detect model developed by [Ultralytics](https://github.com/ultralytics/ultralytics). YOLO
+   has 5 different sizes of base model and a super powerful framework for training and deployment. So I chose YOLO to
+   solve this challenge.
+2. `DocLayNet` is a human-annotated document layout segmentation dataset containing 80863 pages from a broad variety of
+   document sources. As far as I know, it's the most qualified document layout analysis dataset.
+## Usage
+```python
+from ultralytics import YOLO
+model = YOLO("{path to model file}")
+pred = model("{path to test image}")
+print(pred)
+```
+## Dataset
+DocLayNet can be found more details and download at this [link](https://github.com/DS4SD/DocLayNet). It has 11 labels:
+- **Text**: Regular paragraphs.
+- **Picture**: A graphic or photograph.
+- **Caption**: Special text outside a picture or table that introduces this picture or
+  table.
+- **Section-header**: Any kind of heading in the text, except overall document title.
+- **Footnote**: Typically small text at the bottom of a page, with a number or symbol
+  that is referred to in the text above.
+- **Formula**: Mathematical equation on its own line.
+- **Table**: Material arranged in a grid alignment with rows and columns, often
+  with separator lines.
+- **List-item**: One element of a list, in a hanging shape, i.e., from the second line
+  onwards the paragraph is indented more than the first line.
+- **Page-header**: Repeating elements like page number at the top, outside of the
+  normal text flow.
+- **Page-footer**: Repeating elements like page number at the bottom, outside of the
+  normal text flow.
+- **Title**: Overall title of a document, (almost) exclusively on the first page and
+  typically appearing in large font.