akashgoel-id commited on
Commit
eb37f28
1 Parent(s): 29192a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -9,17 +9,16 @@ pipeline_tag: translation
9
 
10
  This is a merge of lora trained on English to Hinglish translation dataset by NATERAW on llama2-7b and OPENHATHI-7B-BASE. Since openHathi has more hindi data in it's pretraining compared to llama2 the translation is significantly better.
11
 
12
-
13
  You can use the prompt template provided by nateraw
14
  "Translate from english to hinglish:\n{{en}}\n---\nTranslation:\n"
15
 
16
 
17
- Sample code:
18
-
19
  from transformers import LlamaForCausalLM, AutoTokenizer
20
  import torch
21
 
22
-
23
  device = "cuda:0"
24
  tokenizer = AutoTokenizer.from_pretrained('akashgoel-id/OpenHathi-7B-English-to-Hinglish')
25
  model = LlamaForCausalLM.from_pretrained('akashgoel-id/OpenHathi-7B-English-to-Hinglish', torch_dtype=torch.bfloat16).to(device)
@@ -34,10 +33,12 @@ while True:
34
  generate_ids = model.generate(inputs.input_ids, max_length=500)
35
  print(tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
36
 
 
 
37
 
38
 
39
 
40
- Limitations:
41
  The model is still not that good when it comes to idioms
42
 
43
  1) Input : When it rains, it pours
@@ -71,6 +72,6 @@ The model is still not that good when it comes to idioms
71
  Evaluation : This is a literal translation and doesn't quite capture the idiomatic meaning of avoiding the main point or not speaking directly about a subject. The phrase "Ghumaphira ke baat karna" would be more appropriate.
72
 
73
 
74
- Next steps:
75
  1) The model seems to be highly censored given it used llama2. Next step would be to remove some of censorship by finetuning on more uncensored data. (What WizardLM has done for llama2)
76
  2) Finetune on idioms
 
9
 
10
  This is a merge of lora trained on English to Hinglish translation dataset by NATERAW on llama2-7b and OPENHATHI-7B-BASE. Since openHathi has more hindi data in it's pretraining compared to llama2 the translation is significantly better.
11
 
12
+ ## Prompting
13
  You can use the prompt template provided by nateraw
14
  "Translate from english to hinglish:\n{{en}}\n---\nTranslation:\n"
15
 
16
 
17
+ **Sample code**:
18
+ ```
19
  from transformers import LlamaForCausalLM, AutoTokenizer
20
  import torch
21
 
 
22
  device = "cuda:0"
23
  tokenizer = AutoTokenizer.from_pretrained('akashgoel-id/OpenHathi-7B-English-to-Hinglish')
24
  model = LlamaForCausalLM.from_pretrained('akashgoel-id/OpenHathi-7B-English-to-Hinglish', torch_dtype=torch.bfloat16).to(device)
 
33
  generate_ids = model.generate(inputs.input_ids, max_length=500)
34
  print(tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
35
 
36
+ ```
37
+
38
 
39
 
40
 
41
+ ## Limitations
42
  The model is still not that good when it comes to idioms
43
 
44
  1) Input : When it rains, it pours
 
72
  Evaluation : This is a literal translation and doesn't quite capture the idiomatic meaning of avoiding the main point or not speaking directly about a subject. The phrase "Ghumaphira ke baat karna" would be more appropriate.
73
 
74
 
75
+ ## Next steps
76
  1) The model seems to be highly censored given it used llama2. Next step would be to remove some of censorship by finetuning on more uncensored data. (What WizardLM has done for llama2)
77
  2) Finetune on idioms