library_name: transformers license: mit model_name: MBart-Urdu-Text-Summarization pipeline_tag: summarization tags: - text-generation - mbart - nlp - transformers - text-generation-inference author: Wali Muhammad Ahmad private: false gated: false inference: true mask_token:

Model Card

MBart-Urdu-Text-Summarization is a fine-tuned MBart model designed for summarizing Urdu text. It leverages the multilingual capabilities of MBart to generate concise and accurate summaries for Urdu paragraphs.

Model Details

Model Description

This model is based on the MBart architecture, which is a sequence-to-sequence model pre-trained on multilingual data. It has been fine-tuned specifically for Urdu text summarization tasks. The model is capable of understanding and generating text in both English and Urdu, making it suitable for multilingual applications.

Model Sources [optional]

Repository: [https://github.com/WaliMuhammadAhmad/UrduTextSummarizationUsingm-BART]
Paper [Multilingual Denoising Pre-training for Neural Machine Translation]: [https://arxiv.org/abs/2001.08210]

Uses

Direct Use

This model can be used directly for Urdu text summarization tasks. It is suitable for applications such as news summarization, document summarization, and content generation.

Downstream Use [optional]

The model can be fine-tuned for specific downstream tasks such as sentiment analysis, question answering, or machine translation for Urdu and English.

Out-of-Scope Use

This model is not intended for generating biased, harmful, or misleading content. It should not be used for tasks outside of text summarization without proper fine-tuning and evaluation.

Bias, Risks, and Limitations

The model may generate biased or inappropriate content if the input text contains biases.
It is trained on a specific dataset and may not generalize well to other domains or languages.
The model's performance may degrade for very long input texts.

Recommendations

Users should carefully evaluate the model's outputs for biases and appropriateness. Fine-tuning on domain-specific data is recommended for better performance in specialized applications.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, MBartForConditionalGeneration

# Load the model and tokenizer
model_name = "ihatenlp/MBart-Urdu-Text-Summarization"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)

# Example input text
input_text = "Enter your Urdu paragraph here."

# Tokenize and generate summary
inputs = tokenizer(input_text, return_tensors="pt")
summary_ids = model.generate(inputs["input_ids"], max_length=50, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

print("Summary:", summary)

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Citation [optional]

BibTeX:

@misc{liu2020multilingualdenoisingpretrainingneural,
      title={Multilingual Denoising Pre-training for Neural Machine Translation}, 
      author={Yinhan Liu and Jiatao Gu and Naman Goyal and Xian Li and Sergey Edunov and Marjan Ghazvininejad and Mike Lewis and Luke Zettlemoyer},
      year={2020},
      eprint={2001.08210},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2001.08210}, 
}

Model Card Authors [optional]

Wali Muhammad Ahmad
Muhammad Labeeb Tariq

Model Card Contact

Email: [[email protected]]
Hugging Face Profile: Wali Muhammad Ahmad

iHateNLP
/

MBart-Urdu-Text-Summarization