ah I see. that’s a really low loss, and for GPT-2(345M) any text file under 1MB will generally lead…

1 min readDec 11, 2019

ah I see. that’s a really low loss, and for GPT-2(345M) any text file under 1MB will generally lead to overfitting, especially with a really small dataset like 50 positive and negative reviews. or maybe the repo you’re using calculates loss differently? (the lowest I’ve ever gotten was 0.3, for William Gibson’s novels & 0.89 for the SCP Wiki.) in any case this sounds like a fun experiment, if you wanted to do it again I think there are some sentiment analysis datasets in csv format on kaggle & 117M would definitely be large enough - 345M might be overkill. if you don’t mind the hassle, google colab now has P100 GPUs which can usually get you 3000 steps an hour. anyway very cool read, interested to see what you do next!

Written by Écspielle Kay

No responses yet