DeepLearning is fun, until it doesn't work as intended.
I still remember the first and only time I tried to write a paper on NLP. The main part is not deep learning, but I need to verify my method and compare to other methods. But at that time I'm sticking with Java, so I had to come up with my own network structure to see if there is any improvement with my method.
Then it has a trend to work but not actually working. Given the knowledge I causally gained from watching youtube videos and reading articles from random toots, I think I might be wrong at that time. But still, I don't know what to do to make sure it's guaranteed to work. I mean, the training process itself is a guess, try different parameters and see which works the best. Back then with my poor HP laptop with a GTX 1060, and power went out during the night (I don't know if that's an asia thing, the school will cut off the power during the night to "protect" their student), it's impossible to get meaningful results.
And after I finished, I found I'm not the only one who came up with the method. Almost 1 month ago a big Chinese company did the same thing and did better than me.
¯\_(ツ)_/¯
Anyway, I still enjoy the way you can sense a word using the projection of vectors. It's romantic, in a math way.
----
但什么是 GPT? 变形金刚的可视化介绍 | 深度学习,第 5 章
by 3Blue1Brown
https://www.youtube.com/watch?v=wjZofJX0v4M