Ransaka
/

mbart-large-cc25-8bit

Text2Text Generation

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

mbart-large-cc25-8bit / README.md

Ransaka's picture

Update README.md

086c481 over 1 year ago

|

history blame contribute delete

2.7 kB

	## About

	This is the 8-bit quantized version of Facebook's mbart model.

	According to the abstract, MBART is a sequence-to-sequence denoising auto-encoder pretrained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pretraining a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text.

	The Authors’ code can be found [here](https://github.com/facebookresearch/fairseq/tree/main/examples/mbart)

	## Usage info

	Install requred packages

	```!pip install -U bitsandbytes sentencepiece```

	then import model from 🤗 transformers library

	```python
	from transformers import MBartTokenizer, AutoModelForSeq2SeqLM, pipeline

	tokenizer = AutoTokenizer.from_pretrained("Ransaka/mbart-large-cc25-8bit")
	model = AutoModelForSeq2SeqLM.from_pretrained("Ransaka/mbart-large-cc25-8bit", device_map='auto')

	# you'll get an output like this if import succeed
	# ===================================BUG REPORT===================================
	# Welcome to bitsandbytes. For bug reports, please run

	# python -m bitsandbytes

	# and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
	# ================================================================================
	# bin /opt/conda/lib/python3.7/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
	# CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
	# CUDA SETUP: Highest compute capability among GPUs detected: 6.0
	# CUDA SETUP: Detected CUDA version 113
	# CUDA SETUP: Loading binary /opt/conda/lib/python3.7/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...

	#create summarization pipeline
	text = """Right now, major tech firms are clamouring to replicate the runaway success of ChatGPT,
	the generative AI chatbot developed by OpenAI using its GPT-3 large language model.
	Much like potential game-changers of the past, such as cloud-based Software as a Service
	(SaaS) platforms or blockchain technology (emphasis on potential), established companies
	and start-ups alike are going public with LLMs and ChatGPT alternatives in fear of being left behind.
	"""
	pipe = pipeline('text2text-generation', model=model, tokenizer=tokenizer)
	pipe(text)
	#[{'generated_text': 'theore, major tech are clamouring to replicate the generative AI chatbot developed by OpenAI using its AI'}]

	print("Model memory usage: {:.2f} MB".format(pipe.model.get_memory_footprint()/1e6))
	# 'Model memory usage: 1893.99 MB'

	```