Model does not create good predictions when doing QA using Reddit questions
A user is using https://www.reddit.com/r/AskReddit to input questions and it generates many answers related to sex. Should this be better disclosed in the model card?
Input: what do you have dreams of innovating?
A: a new type of sex
Input: what's something this generation normalized that shouldn't be normalized at all?
A: sexual assault
Input: what is something everyone hates, but you like?
A: sex
Input: What is socially unacceptable but really shouldn't be?
A: sex
Input: What is your favorite "your mom" joke?
A: "Your mom is a slut"
Input: if you could invent anything useless but useful at the same time, what would it be?
A: sex toy
https://www.reddit.com/r/AskReddit/comments/v0yxtf/if_you_could_invent_anything_useless_but_usefull/
Hi. I am the one who reported it on discord. I just found this out in the morning. I mean It shouldn't take too much time to see it kind of keeps giving the answer that has the word sex.
Thanks for reporting this @osansievero,
@JonathanSum
! We should reflect this in the biais and fairness section in the model card. Would anyone of you like to open a PR? I would happy to dive into it but have very limited bandwidth this week.
(Btw we should also figure out where exactly this is coming from, I suspect it’s one of the fine tuning datasets).
@VictorSanh
I did a pr to add those 6 examples into the biais and fairness section in the model card.
@VictorSanh
I did a pr to add those 6 examples into the bias* and fairness section in the model card.
I created the pull requests for the following models:
T0 11 billion
T0p 11 billion
T0pp 11 billion
T0_single_prompt 11 billion
T0_original_task_only 11 billion
T0_3B
I just want to add one more thing. I don't feel the issue is just Bias. I feel this is similar to the mode collapse problem, which we see in GANs. It is very often that the T0pp model answers the word "sex" as an answer. I feel the reason why the model often gives the answer, "sex", is because most people will give a sexually related answer on Reddit-like forum, so the model will bias toward those answers in the later layer. But not the final layer, because some users said the dataset filtered the word, "sex". In addition, because the later layers bias toward sexually related answers, it will pick a more general sexual-related word as an answer, "sex" for a higher score or lower loss, even though the word sex was filtered. 🤔😆
Of course, the model also has bias issues, such as answering "sexual assault" for pure statistics questions.