how file-level span corruption works?

#2
by zeromquan - opened

It's no clear for the span corruption part. could you helps on it ?
in the paper, there is a simple description:
"We choose span corruption as the base infill objective following InCoder (Fried et al., 2022). However, we take a different approach in selecting the spans for corruption: (1) we first sample a dynamic ratio of sequence to mask out, (2)we then sample the span length and mask out locations such that the total number of tokens match the ratio of the original sequence determined earlier "

Sign up or log in to comment