--- language: - ru --- # T-lite-instruct-0.1 **🚨 T-lite is designed for further fine-tuning and is not intended as a ready-to-use conversational assistant. Users are advised to exercise caution and are responsible for any additional training and oversight required to ensure the model's responses meet acceptable ethical and safety standards. The responsibility for incorporating this model into industrial or commercial solutions lies entirely with those who choose to deploy it.** ## Description T-lite-instruct-0.1 is an instruct version of the T-lite-0.1 model. T-lite-instruct-0.1 was trained in bf16. ### πŸ“š Dataset #### Contexts For the instruction dataset, the contexts are obtained from: - Open Source English-language datasets (such as UltraFeedback, HelpSteer, SHP, and so on) - Translations of English-language datasets through machine translation - Synthetic grounded QA contexts, generated from pre-training datasets The translated contexts are filtered using classifiers. #### SFT The responses to the contexts are generated by a strong model and the training is exclusively carried out on these responses. This avoids training the model on poor-quality translations. #### Reward Modeling RM is trained on such pairs: - Strong Model > Our Model - Stronger Model > Weaker Model - Chosen Translated Response > Rejected Translated Response - Pairs from original English datasets The translated preference data are preliminarily filtered by the RM ensemble. #### Preference tuning Two stages were used in preference tuning: - Stage 1: SPiN on the responses of the teacher model (Strong Model > Our Model) - Stage 2: SLiC-HF using our RM ## πŸ“Š Benchmarks Here we present the results of T-lite-instruct-0.1 on automatic benchmarks. ### πŸ† [MT-Bench](https://huggingface.co./spaces/lmsys/mt-bench) This benchmark was carefully translated into Russian and measured with [LLM Judge](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) codebase, using gpt-4-1106-preview as a judge. | MT-Bench | Total | Turn_1 | Turn_2 | coding | humanities | math | reasoning | roleplay | stem | writing | |-----------------------------------------------------------------|:-----------:|:------------:|:------------:|:------:|:----------:|:----:|:---------:|:--------:|:----:|:-------:| | **T-lite-instruct-0.1** | **6.458** | **6.833** | 6.078 | 4.136 | **8.45** | 4.25 | **4.5** |**7.667** |**7.7**| 7.706 | | gpt3.5-turbo-0125 | 6.373 | 6.423 | **6.320** |**6.519**| 7.474 | 4.75 | 4.15 | 6.333 | 6.7 | 7.588 | | suzume-llama-3-8B-multilingual-orpo-borda-half | 6.051 | 6.577 | 5.526 | 4.318 | 8.0 | 4.0 | 3.6 | 7.056 | 6.7 | **7.889** | | Qwen2-7b-Instruct | 6.026 | 6.449 | 5.603 | 5.0 | 6.95 |**5.8**| 4.15 | 7.167 | 5.85 | 7.278 | | Llama-3-8b-Instruct | 5.948 | 6.662 | 5.224 | 4.727 | 7.8 | 3.9 | 2.8 | 7.333 | 6.053 | 7.0 | | suzume-llama-3-8B-multilingual | 5.808 | 6.167 | 5.449 | 5.409 | 6.4 | 5.05 | 3.8 | 6.556 | 5.0 | 7.056 | | saiga_llama3_8b | 5.471 | 5.896 | 5.039 | 3.0 | 7.4 | 3.55 | 3.5 | 6.444 | 5.15 | 7.812 | | Mistral-7B-Instruct-v0.3 | 5.135 | 5.679 | 4.584 | 4.045 | 6.35 | 3.15 | 3.2 | 5.765 | 5.2 | 7.333 | ### 🏟️ [Arena](https://github.com/lm-sys/arena-hard-auto) We used Russian version of Arena benchmark from [Vikhrmodels](https://huggingface.co./datasets/Vikhrmodels/ru-arena-general) and [Arena Hard Auto](https://github.com/lm-sys/arena-hard-auto) codebase for evaluation. As baseline model we chose gpt3.5-turbo-0125 and the judge was gpt-4-1106-preview. | Arena General | Score | 95% CI | Average Tokens | |-----------------------------------------------------------------|:-----------:|:------------:|:--------------:| | **T-lite-instruct-0.1** | **57.26** | -2.9/2 | 870 | | gpt3.5-turbo-0125 | 50 | 0/0 | 254 | | suzume-llama-3-8B-multilingual-orpo-borda-half | 47.17 | -2.6/2.4 | 735 | | Llama-3-8b-Instruct | 42.16 | -2.1/2.1 | 455 | | saiga_llama3_8b | 39.88 | -2.3/2.5 | 616 | | suzume-llama-3-8B-multilingual | 38.25 | -1.7/1.7 | 625 | | Qwen2-7b-Instruct | 33.42 | -1.9/2.2 | 365 | | Mistral-7B-Instruct-v0.3 | 28.11 | -2/2.2 | 570 | ## πŸ‘¨β€πŸ’» Examples of usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch torch.manual_seed(42) model_name = "t-bank-ai/T-lite-instruct-0.1" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto") messages = [ {"role": "user", "content": "Напиши Ρ€Π΅Ρ†Π΅ΠΏΡ‚ классной ΠΏΠΈΡ†Ρ†Ρ‹!"}, ] input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) terminators = [ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>") ] outputs = model.generate( input_ids, max_new_tokens=256, eos_token_id=terminators, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` Output: ``` ΠšΠΎΠ½Π΅Ρ‡Π½ΠΎ, Π²ΠΎΡ‚ Ρ€Π΅Ρ†Π΅ΠΏΡ‚ для вкусной домашнСй ΠΏΠΈΡ†Ρ†Ρ‹, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ ΠΌΠΎΠΆΠ½ΠΎ Π°Π΄Π°ΠΏΡ‚ΠΈΡ€ΠΎΠ²Π°Ρ‚ΡŒ ΠΏΠΎΠ΄ Ρ€Π°Π·Π½Ρ‹Π΅ вкусы ΠΈ прСдпочтСния. Π’Π°ΠΆΠ½ΠΎ, Ρ‡Ρ‚ΠΎΠ±Ρ‹ тСсто Π±Ρ‹Π»ΠΎ мягким ΠΈ Π²ΠΎΠ·Π΄ΡƒΡˆΠ½Ρ‹ΠΌ, Π° Π½Π°Ρ‡ΠΈΠ½ΠΊΠ° β€” сочной ΠΈ Π°Ρ€ΠΎΠΌΠ°Ρ‚Π½ΠΎΠΉ. ### Π˜Π½Π³Ρ€Π΅Π΄ΠΈΠ΅Π½Ρ‚Ρ‹ для тСста: - 500 Π³ ΠΌΡƒΠΊΠΈ (Π»ΡƒΡ‡ΡˆΠ΅ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ смСсь ΠΏΡˆΠ΅Π½ΠΈΡ‡Π½ΠΎΠΉ ΠΈ Ρ†Π΅Π»ΡŒΠ½ΠΎΠ·Π΅Ρ€Π½ΠΎΠ²ΠΎΠΉ) - 1 Ρ‡. Π». сухих Π΄Ρ€ΠΎΠΆΠΆΠ΅ΠΉ (ΠΈΠ»ΠΈ 7 Π³ свСТих) - 1 Ρ‡. Π». сахара - 1 Ρ‡. Π». соли - 1 ст. Π». ΠΎΠ»ΠΈΠ²ΠΊΠΎΠ²ΠΎΠ³ΠΎ масла - 300 ΠΌΠ» Ρ‚Ρ‘ΠΏΠ»ΠΎΠΉ Π²ΠΎΠ΄Ρ‹ - 1 яйцо (для смазки) ### Π˜Π½Π³Ρ€Π΅Π΄ΠΈΠ΅Π½Ρ‚Ρ‹ для Π½Π°Ρ‡ΠΈΠ½ΠΊΠΈ (ΠΏΡ€ΠΈΠΌΠ΅Ρ€Π½Ρ‹ΠΉ Π½Π°Π±ΠΎΡ€): - 200 Π³ Ρ‚ΠΎΠΌΠ°Ρ‚Π½ΠΎΠ³ΠΎ соуса (ΠΌΠΎΠΆΠ½ΠΎ ΡΠ΄Π΅Π»Π°Ρ‚ΡŒ самому ΠΈΠ· свСТих ΠΏΠΎΠΌΠΈΠ΄ΠΎΡ€ΠΎΠ² ΠΈΠ»ΠΈ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ Π³ΠΎΡ‚ΠΎΠ²Ρ‹ΠΉ) - 200 Π³ ΠΌΠΎΡ†Π°Ρ€Π΅Π»Π»Ρ‹, Π½Π°Ρ€Π΅Π·Π°Π½Π½ΠΎΠΉ Π»ΠΎΠΌΡ‚ΠΈΠΊΠ°ΠΌΠΈ - 100 Π³ сыра ΠΏΠ°Ρ€ΠΌΠ΅Π·Π°Π½ (Ρ‚Π΅Ρ€Ρ‚Ρ‹ΠΉ) - 100 Π³ Π²Π΅Ρ‚Ρ‡ΠΈΠ½Ρ‹ ΠΈΠ»ΠΈ колбасы - 100 Π³ Π³Ρ€ΠΈΠ±ΠΎΠ² (шампин ```