The paper clearly states that knowledge distillation was used, and it doesn't claim to be the original base model. At least it proves that the path of model compression can be viable, but it insists on elevating it to a political issue.
Square-Creator-4028806e5
--
DeepSeek is a grand illusion, let me explain to you clearly what the situation is: a elementary school student always fails their essays, but after reading a few excellent essay books and memorizing them through hard work, they finally can quickly write a passing essay by copying others. And then you tell me they write fast and can compete with the award-winning authors in the essay books... Drink less of that toxic chicken soup fake liquor, feeling high and mighty, thinking I am the best in the world. #deepseek #加密市场回调
Disclaimer: Includes third-party opinions. No financial advice. May include sponsored content.See T&Cs.