Regarding the test set


Is the text created by LLMs from ELOQUENT teams is the only set of data you will use to test our model? If that is the case, while prediction why you are asking a score of is_human? it will be better to just get the prediction label right (either human or generated). Please share your thoughts on this.


There will be at least two datasets. One is the holdout set of the bootstrap dataset and the other is from ELOQUENT. We may add some more specialised test sets to check your model’s robustness against certain text transformations.

The “is_human” score is exactly equivalent to giving a label to each text. See my answer to your other question.