--- language: - ru base_model: t-tech/T-pro-it-1.0 tags: - vllm - bnb - bitsandbytes - 8bit --- # vitekkor/T-pro-it-1.0-bnb-8bit This model is an 8-bit quantization of model [`t-tech/T-pro-it-1.0`](https://huggingface.co./t-tech/T-pro-it-1.0) using bitsandbytes. Refer to the [original model card](https://huggingface.co./t-tech/T-pro-it-1.0) for more details on the model. ## Use with transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer MODEL_NAME = "vitekkor/T-pro-it-1.0-bnb-8bit" model = AutoModelForCausalLM.from_pretrained( MODEL_NAME, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) prompt = "Напиши стих про машинное обучение" messages = [ {"role": "system", "content": "Ты T-pro, виртуальный ассистент в Т-Технологии. Твоя задача - быть полезным диалоговым ассистентом."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=256 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response) ``` ## Use with vllm ### Python ```bash pip install vllm ``` ```python from transformers import AutoTokenizer from vllm import LLM, SamplingParams MODEL_NAME = "vitekkor/T-pro-it-1.0-bnb-8bit" tokenizer = AutoTokenizer.from_pretrained(model_name) sampling_params = SamplingParams(temperature=0.8, top_p=0.95) llm = LLM(model=MODEL_NAME, max_model_len=8192) prompt = "Напиши стих про машинное обучение" messages = [ {"role": "system", "content": "Ты T-pro, виртуальный ассистент в Т-Технологии. Твоя задача - быть полезным диалоговым ассистентом."}, {"role": "user", "content": prompt} ] prompt_token_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True) outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params) generated_text = [output.outputs[0].text for output in outputs] print(generated_text) ``` ### Server: ```bash vllm serve vitekkor/T-pro-it-1.0-bnb-8bit ```