tiedaar's picture
Create README.md
05656b4
|
raw
history blame
1.55 kB
metadata
language:
  - en
thumbnail: url to a thumbnail used in social sharing
tags:
  - macroeconomics
  - automated summary evaluation
  - wording
license: apache-2.0
metrics:
  - mse

Content Model

This is a longformer model with a regression head designed to predict the Content score of a summary.

Corpus

It was trained on a corpus of 4,233 summaries of 101 sources compiled by Botarleanu et al. (2022). The summaries were graded by expert raters on 6 criteria: Details, Main Point, Cohesion, Paraphrasing, Objective Language, and Language Beyond the Text. A principle component analyis was used to reduce the dimensionality of the outcome variables to two.

  • Content includes Details, Main Point, and Cohesion
  • Wording includes Paraphrasing, Objective Language, and Language Beyond the Text

Score

This model predicts the Content score. The model to predict the Wording score can be found here. The following diagram illustrates the model architecture:

model diagram

When providing input to the model, the summary and the source should be concatenated using the seperator token </s>. This allows the model to have access to both the summary and the source to provide more accurate scores. The model reported an R2 of 0.66 on the test set of summaries. wording scatter

Contact

For questions or comments about this model, please contact [email protected].