{"id":125,"date":"2025-02-08T18:58:15","date_gmt":"2025-02-08T18:58:15","guid":{"rendered":"https:\/\/artificial-intelligence.news\/?p=125"},"modified":"2025-02-08T19:58:50","modified_gmt":"2025-02-08T19:58:50","slug":"deepseek-and-the-mother-of-invention","status":"publish","type":"post","link":"https:\/\/artificial-intelligence.news\/?p=125","title":{"rendered":"DeepSeek and the Mother of Invention"},"content":{"rendered":"\n<p>The fable of &#8220;<a href=\"https:\/\/en.wikipedia.org\/wiki\/The_Crow_and_the_Pitcher\">The Crow and the Pitcher<\/a>&#8221; was commented on in the 4th-5th century by Avianus, that &#8220;<strong>thoughtfulness is superior to brute strength<\/strong>&#8220;.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"355\" height=\"441\" src=\"https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image-5.png\" alt=\"\" class=\"wp-image-136\" style=\"width:342px;height:auto\" srcset=\"https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image-5.png 355w, https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image-5-241x300.png 241w\" sizes=\"auto, (max-width: 355px) 100vw, 355px\" \/><figcaption class=\"wp-element-caption\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/The_Crow_and_the_Pitcher#\/media\/File:The_Crow_and_the_Pitcher_-_Project_Gutenberg_etext_19994.jpg\">The crow and pitcher<\/a><\/figcaption><\/figure>\n\n\n\n<p>This echoed true in the latest news from the quickly evolving AI frontier. When DeepSeek released their open source <a href=\"https:\/\/github.com\/deepseek-ai\/DeepSeek-R1\">DeepSeek-R1<\/a> model, based on <a href=\"https:\/\/api-docs.deepseek.com\/news\/news1226\">DeepSeek-V3<\/a> on 20th January 2025. This is a chain-of-thought model rivaling OpenAI&#8217;s flagship o1 reasoning model. With claims of training costs being $6 million, compared to GPT-4&#8217;s $100 million training cost.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Sanctions and Innovation<\/h2>\n\n\n\n<p>DeepSeek is owned by the Chinese quantitive trading firm, <a href=\"https:\/\/www.high-flyer.cn\/en\/fund\/\">High-Flyer<\/a>, operating under CEO Liang Wenfeng. Making the company <a href=\"https:\/\/www.fibermall.com\/blog\/nvidia-ai-chip.htm\">subject to United States sanctions<\/a>. Which limit exports of AI chips to those less powerful than Nvidia A100&#8217;s. <\/p>\n\n\n\n<p>This primarily means that exported chips must have:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An I\/O bandwidth speed (communication speed within the chip and externally) of less than 600 G bytes\/s.<\/li>\n\n\n\n<li>The bit length of each operation multiplied by the raw TOPS (Tera \/ Trillion Operations per Second) must be less than 4800 TOPS.<\/li>\n<\/ul>\n\n\n\n<p><br>Nvidia tailored their H800 range of chips, used for DeepSeek training, for the Chinese market:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Reduced interconnect speed (communication speed within the chip) from 600 GB\/s used for the A100 chips to 400 GB\/s.<\/li>\n\n\n\n<li>Made double precision operations practically unusable, by reducing their speed to 1 TFLOPS.<\/li>\n\n\n\n<li>NVLinks kept at 8, compared to H100&#8217;s 18. These are buses which speed up GPU to GPU communication.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"816\" height=\"287\" src=\"https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image.png\" alt=\"\" class=\"wp-image-128\" srcset=\"https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image.png 816w, https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image-300x106.png 300w, https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image-768x270.png 768w\" sizes=\"auto, (max-width: 816px) 100vw, 816px\" \/><figcaption class=\"wp-element-caption\"><a href=\"https:\/\/www.fibermall.com\/blog\/wp-content\/uploads\/2024\/06\/chip-comparison.png\">A100 versus H800 performance<\/a><\/figcaption><\/figure>\n\n\n\n<p>The 1.76 trillion parameter GPT-4 was trained on <a href=\"https:\/\/interestingengineering.com\/innovation\/nvidia-ai-gpu-openai#:~:text=These%20capabilities%20will%20advance%20OpenAI's,GPUs%20for%20about%20100%20days.\">25,000 A100 GPU&#8217;s for 100 days<\/a>. In contrast, the 671 billion parameters DeepSeek-V3 model was trained on<a href=\"https:\/\/www.tomshardware.com\/tech-industry\/artificial-intelligence\/chinese-ai-company-says-breakthroughs-enabled-creating-a-leading-edge-ai-model-with-11x-less-compute-deepseeks-optimizations-highlight-limits-of-us-sanctions\"> 2,048 H800 GPU&#8217;s for about 60 days<\/a>.<br><\/p>\n\n\n\n<p>To achieve this DeepSeek bypassed the CUDA software which ships with Nvidia chips, and is used across the industry. Instead, they used Nvidia&#8217;s PTX (Parallel Thread Execution) low level language to implement optimized functions. PTX is one level above low-level machine code, which is used to operate the GPU processing cores.<\/p>\n\n\n\n<p>This allowed <a href=\"https:\/\/www.tomshardware.com\/tech-industry\/artificial-intelligence\/deepseeks-ai-breakthrough-bypasses-industry-standard-cuda-uses-assembly-like-ptx-programming-instead\">20 out of 132 streaming microprocessors<\/a> to be reserved for server communication, compensating for bandwidth limitations. Presumably by compressing data streams. Advanced pipeline algorithms were also implemented by the DeepSeek engineers, for thread \/ warp fine-tuning, which is notoriously difficult to do and is a testament to their skill.<\/p>\n\n\n\n<p>DeepSeek-V3 also uses a Mixture-of-Experts (MoE) architecture. Which is a collection of models, each fine-tuned for specific tasks, that only activate when needed. Allowing only <a href=\"https:\/\/github.com\/deepseek-ai\/DeepSeek-V3\">37 billion out of 571 billion parameters<\/a> to be activated for each token, providing further efficiency gains. However, MoE architecture is not new and is also <a href=\"https:\/\/medium.com\/@seanbetts\/peering-inside-gpt-4-understanding-its-mixture-of-experts-moe-architecture-2a42eb8bdcb3\">used by ChatGPT-4<\/a>.<\/p>\n\n\n\n<p>The performance of DeepSeek-V3 in comparison to it&#8217;s counter-parts shows that it achieves a better accuracy on most benchmarks:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"598\" src=\"https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image-4-1024x598.png\" alt=\"\" class=\"wp-image-133\" srcset=\"https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image-4-1024x598.png 1024w, https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image-4-300x175.png 300w, https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image-4-768x449.png 768w, https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image-4-1536x897.png 1536w, https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image-4.png 1702w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><a href=\"https:\/\/github.com\/deepseek-ai\/DeepSeek-V3\">DeepSeek-V3 benchmark&#8217;s comparison<\/a><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Market Reaction<\/h2>\n\n\n\n<p>The economic effect of the release has been noticeable. With the DeepSeek chat app taking over the ChatGPT app, to become the most downloaded on the iOS store. As well as an almost 20% decline in Nvidia&#8217;s share price , as investors questioned whether the newest chips are a necessity for training the latest generation of AI models:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"629\" height=\"200\" src=\"https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image-2.png\" alt=\"\" class=\"wp-image-131\" srcset=\"https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image-2.png 629w, https:\/\/artificial-intelligence.news\/wp-content\/uploads\/2025\/02\/image-2-300x95.png 300w\" sizes=\"auto, (max-width: 629px) 100vw, 629px\" \/><figcaption class=\"wp-element-caption\"><a href=\"https:\/\/www.marketwatch.com\/investing\/stock\/nvda\">Nvidia share price fell 20% on 27th January, in response to DeepSeek news.<\/a><\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Open Source<\/h2>\n\n\n\n<p>Possibly more interesting, is the fact that DeepSeek is open source, with it&#8217;s code available on <a href=\"https:\/\/github.com\/deepseek-ai\/DeepSeek-V3\">GitHub<\/a>. In contrast to &#8220;OpenAI&#8221; models, whose source code is not available to the general public. <\/p>\n\n\n\n<p>The reason for making the code open source, may be to foster innovation but to also help allay fears that DeepSeek is being used to exfiltrate data. A concern which has led to countries such as <a href=\"https:\/\/www.aljazeera.com\/news\/2025\/2\/6\/which-countries-have-banned-deepseek-and-why\">the U.S, South Korea, Taiwan and Australia<\/a> banning it&#8217;s use on government devices.<\/p>\n\n\n\n<p>However, for non government consumers who want to protect their data, there is the option to <a href=\"https:\/\/huggingface.co\/deepseek-ai\/DeepSeek-V3\">download the model<\/a> and run it on an isolated network. <\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The fable of &#8220;The Crow and the Pitcher&#8221; was commented on in the 4th-5th century by Avianus, that &#8220;thoughtfulness is superior to brute strength&#8220;. This echoed true in the latest news from the quickly evolving AI frontier. When DeepSeek released their open source DeepSeek-R1 model, based on DeepSeek-V3 on 20th January 2025. This is a [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":139,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-container-style":"default","site-container-layout":"default","site-sidebar-layout":"default","disable-article-header":"default","disable-site-header":"default","disable-site-footer":"default","disable-content-area-spacing":"default","footnotes":""},"categories":[1],"tags":[],"class_list":["post-125","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"_links":{"self":[{"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=\/wp\/v2\/posts\/125","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=125"}],"version-history":[{"count":11,"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=\/wp\/v2\/posts\/125\/revisions"}],"predecessor-version":[{"id":143,"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=\/wp\/v2\/posts\/125\/revisions\/143"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=\/wp\/v2\/media\/139"}],"wp:attachment":[{"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=125"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=125"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=125"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}