BeibitDS commited on
Commit
ca1ccdc
·
verified ·
1 Parent(s): dd31236

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -62
README.md CHANGED
@@ -17,6 +17,8 @@ tags:
17
  - **License:** apache-2.0
18
  - **Finetuned from model :** Meta-Llama-3-8B
19
 
 
 
20
  ## Requirements
21
  To install the necessary dependencies, use the following commands:
22
 
@@ -44,65 +46,3 @@ To install the necessary dependencies, use the following commands:
44
  text_streamer = TextStreamer(tokenizer)
45
  _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)
46
  ```
47
-
48
- - running inference with threading and parameters
49
- ```python
50
- from threading import Thread
51
- import textwrap
52
-
53
- def generate_streaming_text(generation_kwargs):
54
- max_print_width = 2048
55
- thread = Thread(target=model.generate, kwargs=generation_kwargs)
56
- thread.start()
57
- length = 0
58
- for j, new_text in enumerate(text_streamer):
59
- if new_text == '<|end_of_text|>':
60
- break
61
- if j == 0:
62
- wrapped_text = textwrap.wrap(new_text, width=max_print_width)
63
- length = len(wrapped_text[-1])
64
- wrapped_text = " ".join(wrapped_text)
65
- print(wrapped_text, end="")
66
- else:
67
- length += len(new_text)
68
- if length >= max_print_width:
69
- length = 0
70
- print()
71
- print(new_text, end="")
72
- return
73
-
74
- input_texts = [
75
- "Сұрақ: Желтоқсан айында неше күн бар? \nЖауабы: ",
76
- 'Грамматикалық қателерді дұрыста."\n\n### Мәтін:\nОған бұйермады\n\n### Жауабы:'
77
- ]
78
-
79
- from transformers import TextIteratorStreamer
80
- import torch
81
-
82
- text_streamer = TextIteratorStreamer(tokenizer)
83
-
84
- if tokenizer.pad_token_id is None:
85
- tokenizer.pad_token_id = tokenizer.eos_token_id
86
- inputs = tokenizer(
87
- input_texts[0],
88
- return_tensors="pt",
89
- padding=True,
90
- truncation=True,
91
- max_length=512,
92
- ).to("cuda")
93
- attention_mask = inputs['attention_mask']
94
- generation_kwargs = {
95
- 'input_ids': inputs['input_ids'],
96
- "streamer": text_streamer,
97
- "max_new_tokens": 280,
98
- # "use_cache": True,
99
- 'pad_token_id': tokenizer.pad_token_id,
100
- 'attention_mask': attention_mask,
101
- # 'no_repeat_ngram_size': 6,
102
- # 'temperature': 0.4,
103
- # 'top_k': 20,
104
- # 'top_p': 0.95,
105
- }
106
-
107
- generate_streaming_text(generation_kwargs)
108
- ```
 
17
  - **License:** apache-2.0
18
  - **Finetuned from model :** Meta-Llama-3-8B
19
 
20
+ This model underwent Continuous Pretraining (CPT) on an extensive Kazakh text corpus to optimize LLAMA3 for the Kazakh language. It was subsequently fine-tuned with Kazakh-language instructional data. The model demonstrates strong performance in processing Kazakh text, answering text-based questions, correcting punctuation and grammar, and summarizing text. However, there is still room for improvement in handling open-ended questions.
21
+
22
  ## Requirements
23
  To install the necessary dependencies, use the following commands:
24
 
 
46
  text_streamer = TextStreamer(tokenizer)
47
  _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)
48
  ```