Writer
/

palmyra-small

Text Generation

text generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

wassemgtk commited on Feb 6, 2023

Commit

9047354

•

1 Parent(s): 94bb2c3

Update README.md

Files changed (1) hide show

README.md +6 -17

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ library_name: transformers
 license: cc-by-4.0
-# Writer-small 128M
 <style>
 img {
@@ -27,14 +27,7 @@ img {
 ## Model Description
-Writer-small 128M is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while. It has Tensor Parallelism (TP) of 1, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU.
-# GPT-J 6B
-## Model Description
-GPT-J 6B is a transformer model trained using Ben Wang's [Mesh Transformer JAX](https://github.com/kingoflolz/mesh-transformer-jax/). "GPT-J" refers to the class of model, while "6B" represents the number of trainable parameters.
 <figure>
@@ -60,15 +53,11 @@ GPT-2/GPT-3.
 ## Training data
-GPT-J 6B was trained on [the Pile](https://pile.eleuther.ai), a large-scale curated dataset created by [EleutherAI](https://www.eleuther.ai).
-## Training procedure
-This model was trained for 402 billion tokens over 383,500 steps on TPU v3-256 pod. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
 ## Intended Use and Limitations
-GPT-J learns an inner representation of the English language that can be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating text from a prompt.
 ### How to use
@@ -77,8 +66,8 @@ This model can be easily loaded using the `AutoModelForCausalLM` functionality:
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
-model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")
 ```
 ### Limitations and Biases

 license: cc-by-4.0
+# Palmyra-small
 <style>
 img {
 ## Model Description
+Palmyra-small 128M is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while. It has Tensor Parallelism (TP) of 1, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU.
 <figure>
 ## Training data
+Palmyra-small 128M was trained on
 ## Intended Use and Limitations
+Palmyra-small learns an inner representation of the English language that can be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating text from a prompt.
 ### How to use
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("EleutherAI/palmyra-small")
+model = AutoModelForCausalLM.from_pretrained("EleutherAI/palmyra-small")
 ```
 ### Limitations and Biases