YAML Metadata Error: "datasets[1]" with value "chemical patent" is not valid. If possible, use a dataset id from https://hf.co/datasets.
YAML Metadata Error: "datasets[2]" with value "cooking recipe" is not valid. If possible, use a dataset id from https://hf.co/datasets.

Proc-RoBERTa

Proc-RoBERTa is a pre-trained language model for procedural text. It was built by fine-tuning the RoBERTa-based model on a procedural corpus (PubMed articles/chemical patents/cooking recipes), which contains 1.05B tokens. More details can be found in the following paper:

@inproceedings{bai-etal-2021-pre,
    title = "Pre-train or Annotate? Domain Adaptation with a Constrained Budget",
    author = "Bai, Fan  and
              Ritter, Alan  and
              Xu, Wei",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
}

Usage

from transformers import *
tokenizer = AutoTokenizer.from_pretrained("fbaigt/proc_roberta")
model = AutoModelForTokenClassification.from_pretrained("fbaigt/proc_roberta")

More usage details can be found here.

Downloads last month
4
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.