wassemgtk commited on
Commit
9047354
1 Parent(s): 94bb2c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -17
README.md CHANGED
@@ -14,7 +14,7 @@ library_name: transformers
14
  license: cc-by-4.0
15
 
16
 
17
- # Writer-small 128M
18
 
19
  <style>
20
  img {
@@ -27,14 +27,7 @@ img {
27
 
28
  ## Model Description
29
 
30
- Writer-small 128M is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while. It has Tensor Parallelism (TP) of 1, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU.
31
-
32
-
33
- # GPT-J 6B
34
-
35
- ## Model Description
36
-
37
- GPT-J 6B is a transformer model trained using Ben Wang's [Mesh Transformer JAX](https://github.com/kingoflolz/mesh-transformer-jax/). "GPT-J" refers to the class of model, while "6B" represents the number of trainable parameters.
38
 
39
  <figure>
40
 
@@ -60,15 +53,11 @@ GPT-2/GPT-3.
60
 
61
  ## Training data
62
 
63
- GPT-J 6B was trained on [the Pile](https://pile.eleuther.ai), a large-scale curated dataset created by [EleutherAI](https://www.eleuther.ai).
64
-
65
- ## Training procedure
66
-
67
- This model was trained for 402 billion tokens over 383,500 steps on TPU v3-256 pod. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
68
 
69
  ## Intended Use and Limitations
70
 
71
- GPT-J learns an inner representation of the English language that can be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating text from a prompt.
72
 
73
  ### How to use
74
 
@@ -77,8 +66,8 @@ This model can be easily loaded using the `AutoModelForCausalLM` functionality:
77
  ```python
78
  from transformers import AutoTokenizer, AutoModelForCausalLM
79
 
80
- tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
81
- model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")
82
  ```
83
 
84
  ### Limitations and Biases
 
14
  license: cc-by-4.0
15
 
16
 
17
+ # Palmyra-small
18
 
19
  <style>
20
  img {
 
27
 
28
  ## Model Description
29
 
30
+ Palmyra-small 128M is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while. It has Tensor Parallelism (TP) of 1, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU.
 
 
 
 
 
 
 
31
 
32
  <figure>
33
 
 
53
 
54
  ## Training data
55
 
56
+ Palmyra-small 128M was trained on
 
 
 
 
57
 
58
  ## Intended Use and Limitations
59
 
60
+ Palmyra-small learns an inner representation of the English language that can be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating text from a prompt.
61
 
62
  ### How to use
63
 
 
66
  ```python
67
  from transformers import AutoTokenizer, AutoModelForCausalLM
68
 
69
+ tokenizer = AutoTokenizer.from_pretrained("EleutherAI/palmyra-small")
70
+ model = AutoModelForCausalLM.from_pretrained("EleutherAI/palmyra-small")
71
  ```
72
 
73
  ### Limitations and Biases