fabiochiu
/

t5-base-tag-generation

Text2Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

fabiochiu commited on May 23, 2022

Commit

a28a1f2

•

1 Parent(s): c5f2a16

Update README.md

Files changed (1) hide show

README.md +28 -0

README.md CHANGED Viewed

@@ -14,6 +14,34 @@ widget:
 This model is [t5-base](https://huggingface.co/t5-base) fine-tuned on the [190k Medium Articles](https://www.kaggle.com/datasets/fabiochiusano/medium-articles) dataset for predicting article tags using the article textual content as input.
 ## Data cleaning
 The dataset is composed of Medium articles and their tags. However, each Medium article can have at most five tags, therefore the author needs to choose what he/she believes are the best tags (mainly for SEO-related purposes). This means that an article with the "Python" tag may have not the "Programming Languages" tag, even though the first implies the latter.

 This model is [t5-base](https://huggingface.co/t5-base) fine-tuned on the [190k Medium Articles](https://www.kaggle.com/datasets/fabiochiusano/medium-articles) dataset for predicting article tags using the article textual content as input.
+# How to use the model
+```
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+import nltk
+nltk.download('punkt')
+tokenizer = AutoTokenizer.from_pretrained("fabiochiu/t5-base-tag-generation")
+model = AutoModelForSeq2SeqLM.from_pretrained("fabiochiu/t5-base-tag-generation")
+text = """
+Python is a high-level, interpreted, general-purpose programming language. Its
+design philosophy emphasizes code readability with the use of significant
+indentation. Python is dynamically-typed and garbage-collected.
+"""
+inputs = tokenizer([text], max_length=512, truncation=True, return_tensors="pt")
+output = model.generate(**inputs, num_beams=8, do_sample=True, min_length=10,
+                        max_length=64)
+decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
+tags = list(set(decoded_output.strip().split(", ")))
+print(tags)
+# ['Programming', 'Code', 'Software Development', 'Programming Languages',
+#  'Software', 'Developer', 'Python', 'Software Engineering', 'Science',
+#  'Engineering', 'Technology', 'Computer Science', 'Coding', 'Digital', 'Tech',
+#  'Python Programming']
+```
 ## Data cleaning
 The dataset is composed of Medium articles and their tags. However, each Medium article can have at most five tags, therefore the author needs to choose what he/she believes are the best tags (mainly for SEO-related purposes). This means that an article with the "Python" tag may have not the "Programming Languages" tag, even though the first implies the latter.