unstructuredio (unstructured.io)

Welcome to our space! 🎊

The Unstructured.io Team provides libraries with open-source components for pre-processing text documents such as PDFs, HTML and Word Documents. These components are packaged as bricks 🧱, which provide users the building blocks they need to build pipelines targeted at the documents they care about. Bricks in the library fall into three categories:

🧩 Partitioning bricks that break raw documents down into standard, structured elements.
🧹 Cleaning bricks that remove unwanted text from documents, such as boilerplate and sentence fragments.
🎭 Staging bricks that format data for downstream tasks, such as ML inference and data labeling.

In this space we explore different settings of deep-learning models fine-tuned with several datasets containing a specific document type and corresponding annotations.

Main GitHub repository link: here

unstructured.io

AI & ML interests

spaces 6

Unstructured Chipper App

Unstructured Chipper App

Irs Manuals

Receipt Parser

Chat Your Data ISW

Invoices Parser

models 7

unstructuredio/donut-base-labelstudio-A1.0

unstructuredio/yolo_x_layout

unstructuredio/detectron2_mask_rcnn_X_101_32x8d_FPN_3x

unstructuredio/donut-invoices

unstructuredio/detectron2_faster_rcnn_R_50_FPN_3x

unstructuredio/oer-checkbox

unstructuredio/donut-base-sroie

datasets

AI & ML interests

Team members 42

spaces 6 Sort: Recently updated

Unstructured Chipper App

Unstructured Chipper App

Irs Manuals

Receipt Parser

Chat Your Data ISW

Invoices Parser

models 7 Sort: Recently updated

datasets

spaces 6

models 7