LUAR
Collection
Stylistic Representations using LUAR: https://aclanthology.org/2021.emnlp-main.70/
•
2 items
•
Updated
Author Style Representations using LUAR.
The LUAR training and evaluation repository can be found here.
This particular model was trained on a subsample of the Pushshift Reddit Dataset (5 million users) for comments published between January 2015 and October 2019 by authors publishing at least 100 comments during that period.
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("rrivera1849/LUAR-CRUD")
model = AutoModel.from_pretrained("rrivera1849/LUAR-CRUD")
# we embed `episodes`, a colletion of documents presumed to come from an author
# NOTE: make sure that `episode_length` consistent across `episode`
batch_size = 3
episode_length = 16
text = [
["Foo"] * episode_length,
["Bar"] * episode_length,
["Zoo"] * episode_length,
]
text = [j for i in text for j in i]
tokenized_text = tokenizer(
text,
max_length=32,
padding="max_length",
truncation=True,
return_tensors="pt"
)
# inputs size: (batch_size, episode_length, max_token_length)
tokenized_text["input_ids"] = tokenized_text["input_ids"].reshape(batch_size, episode_length, -1)
tokenized_text["attention_mask"] = tokenized_text["attention_mask"].reshape(batch_size, episode_length, -1)
print(tokenized_text["input_ids"].size()) # torch.Size([3, 16, 32])
print(tokenized_text["attention_mask"].size()) # torch.Size([3, 16, 32])
out = model(**tokenized_text)
print(out.size()) # torch.Size([3, 512])
# to get the Transformer attentions:
out, attentions = model(**tokenized_text, output_attentions=True)
print(attentions[0].size()) # torch.Size([48, 12, 32, 32])
If you find this model helpful, feel free to cite our publication.
@inproceedings{uar-emnlp2021,
author = {Rafael A. Rivera Soto and Olivia Miano and Juanita Ordonez and Barry Chen and Aleem Khan and Marcus Bishop and Nicholas Andrews},
title = {Learning Universal Authorship Representations},
booktitle = {EMNLP},
year = {2021},
}
LUAR is distributed under the terms of the Apache License (Version 2.0).
All new contributions must be made under the Apache-2.0 licenses.