teddy-f-47 commited on
Commit
d0f5976
1 Parent(s): a702998

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -10
README.md CHANGED
@@ -8,42 +8,68 @@ model-index:
8
  results: []
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
-
14
  # phi-2-pl-v_0_1
15
 
16
- This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on an unknown dataset.
17
 
18
  ## Model description
19
 
20
- More information needed
21
 
22
  ## Intended uses & limitations
23
 
24
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
  ## Training and evaluation data
27
 
28
- More information needed
29
 
30
  ## Training procedure
31
 
 
 
 
 
32
  ### Training hyperparameters
33
 
34
  The following hyperparameters were used during training:
35
  - learning_rate: 0.0002
 
36
  - train_batch_size: 8
37
- - eval_batch_size: 8
38
- - seed: 42
39
  - optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
40
  - lr_scheduler_type: cosine
41
  - lr_scheduler_warmup_ratio: 0.1
42
  - num_epochs: 1
 
 
43
 
44
  ### Training results
45
 
46
-
 
47
 
48
  ### Framework versions
49
 
 
8
  results: []
9
  ---
10
 
 
 
 
11
  # phi-2-pl-v_0_1
12
 
13
+ This model is based on [microsoft/phi-2](https://huggingface.co/microsoft/phi-2). It was trained from scratch on the 20231201 Polish Wikipedia dump.
14
 
15
  ## Model description
16
 
17
+ The model was trained for a context length of 2048 tokens.
18
 
19
  ## Intended uses & limitations
20
 
21
+ The model is intended for research purposes only. It may generate fictitious, incorrect, unethical, or biased texts. At its current state, it is not suitable for production purposes.
22
+
23
+ Example:
24
+ ```
25
+ tokenizer = AutoTokenizer.from_pretrained(
26
+ model_name, trust_remote_code=True, use_fast=True
27
+ )
28
+ model = AutoModelForCausalLM.from_pretrained(
29
+ model_name, vocab_size=len(tokenizer), attn_implementation="flash_attention_2",
30
+ trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto"
31
+ )
32
+ model.eval()
33
+
34
+ generation_config = GenerationConfig.from_pretrained(
35
+ model_name, do_sample=False, repetition_penalty=1.5,
36
+ min_new_tokens=1, max_new_tokens=128
37
+ )
38
+
39
+ test_input = tokenizer("Wrocław to polski miasto. Wrocław jest ", return_tensors='pt').to(torch.device('cuda'))
40
+ test_output = model.generate(**test_input, generation_config=generation_config)
41
+ test_preds = tokenizer.batch_decode(sequences=test_output, skip_special_tokens=True, clean_up_tokenization_spaces=True)
42
+ print(test_preds)
43
+ ```
44
 
45
  ## Training and evaluation data
46
 
47
+ The 20231201 Polish Wikipedia dump.
48
 
49
  ## Training procedure
50
 
51
+ ### Training environment
52
+
53
+ - GPU: 1 x A100X (80GB)
54
+
55
  ### Training hyperparameters
56
 
57
  The following hyperparameters were used during training:
58
  - learning_rate: 0.0002
59
+ - num_devices: 1
60
  - train_batch_size: 8
61
+ - gradient_accumulation_steps: 1
 
62
  - optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
63
  - lr_scheduler_type: cosine
64
  - lr_scheduler_warmup_ratio: 0.1
65
  - num_epochs: 1
66
+ - precision: bf16
67
+ - seed: 42
68
 
69
  ### Training results
70
 
71
+ - runtime: 1mo 3d 9h 40m 16s
72
+ - train_loss: 2.983
73
 
74
  ### Framework versions
75