Update README.md
Browse files
README.md
CHANGED
@@ -21,8 +21,9 @@ It achieves the following results on the evaluation set:
|
|
21 |
This model is intended to be used for automatic podcast summarisation. Given the podcast transcript in input, the objective is to provide a short text summary that a user might read when deciding whether to listen to a podcast. The summary should accurately convey the content of the podcast, be human-readable, and be short enough to be quickly read on a smartphone screen.
|
22 |
|
23 |
## Training and evaluation data
|
24 |
-
|
25 |
-
|
|
|
26 |
The test set consists of 1,027 episodes. Only 1025 have been used because two of them did not contain an episode description.
|
27 |
|
28 |
|
|
|
21 |
This model is intended to be used for automatic podcast summarisation. Given the podcast transcript in input, the objective is to provide a short text summary that a user might read when deciding whether to listen to a podcast. The summary should accurately convey the content of the podcast, be human-readable, and be short enough to be quickly read on a smartphone screen.
|
22 |
|
23 |
## Training and evaluation data
|
24 |
+
In our solution, an extractive module is developed to select salient chunks from the transcript, which serve as the input to an abstractive summarizer.
|
25 |
+
An extensive pre-processing on the creator-provided descriptions is performed selecting a subset of the corpus that is suitable for the training supervised model.
|
26 |
+
We split the filtered dataset into train/dev sets of 69,336/7,705 episodes.
|
27 |
The test set consists of 1,027 episodes. Only 1025 have been used because two of them did not contain an episode description.
|
28 |
|
29 |
|