Get error when run sample code

by WYYexperiments - opened Sep 10, 2024

Sep 10, 2024

Hi! Thank you for your great work!
When I run the sample code, I get the following error:
Some weights of BeitModel were not initialized from the model checkpoint at cmarkea/dit-base-layout-detection and are newly initialized: ['beit.pooler.layernorm.bias', 'beit.pooler.layernorm.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Do you have any idea on this? Thank you in advance.

Cyrile

Credit Mutuel Arkea org Sep 11, 2024

Hi WYYexperiments, thank you.
This is not an error but a warning. It comes from the fact that, in order to improve the model's performance, we trained the model on the original cross-entropy loss and a cost function aimed at predicting the bounding boxes. However, this warning will not affect the inference performance.

WYYexperiments

Sep 11, 2024

Hi Cyrile, thank you for the reply.
I get another error
logits = outputs.logits
^^^^^^^^^^^^^^
AttributeError: 'BeitModelOutputWithPooling' object has no attribute 'logits'
The output has two tensor, 'pooler_output' and 'last_hidden_state', no attribute 'logits'
Can I get your help in this?

Cyrile

Credit Mutuel Arkea org Sep 11, 2024

Yes, you are right, sorry.
We should not use AutoModel, but BeitForSemanticSegmentation.
I have modified the example accordingly, and it will work (and the warning will also no longer appear)...

WYYexperiments

Sep 11, 2024

I get this, when I run your code to convert mask to bbox.
bbx, lab = detect_bboxes(mm.numpy())
^^^^^^^^
ValueError: too many values to unpack (expected 2)

Seems the detected_blocks value does not have label information.
May I have the code to do the visualization you put on the model card?

Cyrile

Credit Mutuel Arkea org Sep 12, 2024

•

edited Sep 12, 2024

Hi, yes of course, here is an untested and non-debugged code snippet. Feel free to adapt it according to your needs.

from collections import OrderedDict
from PIL import Image

import torch
import matplotlib.pyplot as plt
from einops import rearrange
from torchvision.transforms.functional import pil_to_tensor
from torchvision.utils import draw_segmentation_masks

map_color = OrderedDict(
    [("Caption", "red"),
     ("Footnote", "yellowgreen"),
     ("Formula", "skyblue"),
     ("List-item", "magenta"),
     ("Page-footer", "red"),
     ("Page-header", "darkorange"),
     ("Picture", "gold"),
     ("Section-header", "indigo"),
     ("Table", "sienna"),
     ("Text", "slategray"),
     ("Title", "teal")]
)

segmentation = img_proc.post_process_semantic_segmentation(output, target_sizes=[img.size[::-1]])
img_tensor = pil_to_tensor(img)

colors, masks, labels = [], [], []
for ii, (label, color) in enumerate(map_color.items()):
    mask = segmentation[0] == (ii+1)
    if mask.sum() > 0:
        masks.append(mask)
        labels.append(label)
        colors.append(color)

masks = torch.stack(masks)
drawn_seg = draw_segmentation_masks(img_tensor, masks, alpha=0.5, colors=colors)
im_seg = Image.fromarray(rearrange(drawn_seg, 'C H W -> H W C').numpy())
plt.imshow(im_seg)

WYYexperiments

Sep 13, 2024

Hi Cyrile, thank you for your warm help. Your model has an impressive performance.
Can you share more info about the training you did?
I found we can only found Dit base checkpoints. There is no available object detection checkpoints provided by microsoft online.
I feel we need more build the model by ourselves.

Cyrile

Credit Mutuel Arkea org Sep 13, 2024

You will find information about the training here: https://huggingface.co./cmarkea/dit-base-layout-detection/discussions/1#66bf025c43a701a8376ebd1c

Cyrile changed discussion status to closed Sep 13, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment