--- datasets: - McAuley-Lab/Amazon-Reviews-2023 language: - en library_name: pytorch pipeline_tag: text-generation base_model: openai-community/gpt2-medium --- # GPT-2 Medium - Review ## Model Details **Model Description:** This model is a checkpoint of GPT-2 Medium the **355M parameter** version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a further pretrained model on a causal language modeling (CLM) objective with English Amazon Product Reviews from the Fashion category. - **Developed by:** Students at University of Konstanz - **Model Type:** Transformer-based language model - **Language(s):** English - **Base Model:** [GPT2-medium](https://huggingface.co./openai-community/gpt2-medium) - **Resources for more information:** [GitHub Repo](https://github.com/TomSOWI/DLSS-24-Synthetic-Product-Reviews-Generation) ## How to Get Started with the Model Use the code below to get started with the model. You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility: ```python >>> from transformers import pipeline, set_seed >>> generator = pipeline('text-generation', model='TomData/GPT2-review') >>> set_seed(42) >>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5) ``` Here is how to use this model to get the features of a given text in PyTorch: ```python tokenizer = AutoTokenizer.from_pretrained("TomData/GPT2-review") model = AutoModelForCausalLM.from_pretrained("TomData/GPT2-review") text = "Replace me by any text you'd like." encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input) ``` and in TensorFlow: ```python tokenizer = AutoTokenizer.from_pretrained("TomData/GPT2-review") model = AutoModelForCausalLM.from_pretrained("TomData/GPT2-review") text = "Replace me by any text you'd like." encoded_input = tokenizer(text, return_tensors='tf') output = model(encoded_input) ``` ## Uses This model is further pretrained to generate artificial product reviews. This can be usefull for: - Market research - Product analysis - Customer preferences - Fashion trends - Research ## Training The model is further pretrained on the [Amazion Review Dataset](https://huggingface.co./datasets/McAuley-Lab/Amazon-Reviews-2023) from McAuley-Lab. For training only the reviews related to the Amazon Fashion category are used. See: ```python dataset = load_dataset("McAuley-Lab/Amazon-Reviews-2023", "raw_review_Amazon_Fashion", trust_remote_code=True) ```