stanford-oval
/

paraphraser-bart-large

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

s-jse commited on Aug 31, 2022

Commit

9115f57

•

1 Parent(s): 80c6fd0

Update README

Files changed (1) hide show

README.md +17 -1

README.md CHANGED Viewed

@@ -1,9 +1,25 @@
 ---
 license: apache-2.0
 ---
-The paraphrasing model described and used in the paper
 "[AutoQA: From Databases to QA Semantic Parsers with Only Synthetic Training Data](https://arxiv.org/abs/2010.04806)" (EMNLP 2020).
 If you are using this model in your work, please use this citation:
 ```

 ---
 license: apache-2.0
 ---
+# Introduction
+The automatic paraphrasing model described and used in the paper
 "[AutoQA: From Databases to QA Semantic Parsers with Only Synthetic Training Data](https://arxiv.org/abs/2010.04806)" (EMNLP 2020).
+# Training data
+A cleaned version of the ParaBank 2 dataset introduced in "[Large-Scale, Diverse, Paraphrastic Bitexts via Sampling and Clustering](https://aclanthology.org/K19-1005/)".
+ParaBank 2 is a paraphrasing dataset constructed by back-translating the Czech portion of an English-Czech parallel corpus.
+We use a subset of 5 million sentence pairs with the highest dual conditional cross-entropy score (which corresponds to the highest paraphrasing quality), and use only one of the five paraphrases provided for each sentence.
+The cleaning process involved removing sentences that do not look like normal English sentences, e.g. contain URLs, contain too many special characters, etc.
+# Training Procedure
+The model is fine-tuned for 4 epochs on the above-mentioned dataset, starting from `facebook/bart-large` checkpoint.
+We use token-level cross-entropy loss calculated using the gold paraphrase sentence. To ensure the output of the model is grammatical, during training, we use the back-translated Czech sentence as the input and the human-written English sentence as the output. Training is done with mini-batches of 1280 examples. For higher training efficiency, each mini-batch is constructed by grouping sentences of similar length together.
+ # How to use
+ Using `top_p=0.9` and `temperature` between `0` and `1` usually results in good generated paraphrases. Higher temperatures make paraphrases more diverse and more different from the input, but might slightly change the meaning of the original sentence.
+# Citation
 If you are using this model in your work, please use this citation:
 ```