There is a reply to the original Chinese post (whose link is in the reddit post):
(translated by Google)
"In the past two days, I have humbly listened to feedback from all parties (for example, defects in coding, creative writing, etc. must be improved). I hope that there will be improvements in the next version.
However, we have never overfit the test set just to fake the benchmarks. My real name is Licheng Yu, and I have handled the posttraining of two OSS models. Please let me know which prompt was selected from the test set and put into the training set, and I will apologize!"
Did you mean why did I share the refuting comment? Considering I do not have the ability to check the authenticity of either comment, I believe it is even more important to provide both sides of the argument, especially when the refuting comment is in Chinese only.
There is a reply to the original Chinese post (whose link is in the reddit post):
(translated by Google)
"In the past two days, I have humbly listened to feedback from all parties (for example, defects in coding, creative writing, etc. must be improved). I hope that there will be improvements in the next version.
However, we have never overfit the test set just to fake the benchmarks. My real name is Licheng Yu, and I have handled the posttraining of two OSS models. Please let me know which prompt was selected from the test set and put into the training set, and I will apologize!"
Why signal-boost this then?
Did you mean why did I share the refuting comment? Considering I do not have the ability to check the authenticity of either comment, I believe it is even more important to provide both sides of the argument, especially when the refuting comment is in Chinese only.
No, I meant why you posted the Reddit thread in the first place.
The discussion thread on reddit is eye opening. People think it’s normal to train on your test data.
EDIT: Ok, seems this sentiment wasn’t shared by other commenters (or Reddit’s UI hid those comments).
I read the comments and the single person who implied that has multiple people categorically stating that it's not acceptable in response.