{"id":61,"date":"2024-02-18T20:17:15","date_gmt":"2024-02-18T20:17:15","guid":{"rendered":"https:\/\/artificial-intelligence.news\/?p=61"},"modified":"2024-02-18T20:22:18","modified_gmt":"2024-02-18T20:22:18","slug":"openai-sora-beneath-the-surface","status":"publish","type":"post","link":"https:\/\/artificial-intelligence.news\/?p=61","title":{"rendered":"OpenAI Sora: Beneath the surface"},"content":{"rendered":"\n<p>If you are not already aware, Sora is the latest in a series of groundbreaking models released by OpenAI. Which can create photorealistic short videos from a text prompt.<\/p>\n\n\n\n<p>Here&#8217;s an example it produced of two ships sailing in a storm in a coffee cup:<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Ships in Coffee by OpenAI Sora\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/vLCqSUUOmy0?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><figcaption class=\"wp-element-caption\">Prompt: Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">How it works<\/h2>\n\n\n\n<p>The OpenAi research paper <a href=\"https:\/\/openai.com\/research\/video-generation-models-as-world-simulators\">Video generation models as world simulators<\/a>, describes Sora as a text-conditional diffusion model, leveraging transformer architecture which operates on spacetime patches.<\/p>\n\n\n\n<p>A <strong>diffusion model<\/strong> adds noise during its forward diffusion process, and then learns features by removing the nosie in the reverse diffusion process.<\/p>\n\n\n\n<p>The <strong>transformer<\/strong> architectire was famously used for ChatGPT, which is a large language model (LLM). These basically work using an &#8220;attention&#8221; mechanism, where they look at surrounding context to help learn patterns.<\/p>\n\n\n\n<p>A <strong>spacetime patch<\/strong> is basically a compressed representation of a video expressed in low dimensional latent space. The reason it is called a spacetime patch, is because both spatial and temporal information are compressed.<\/p>\n\n\n\n<p>To get the final video output, a decoder model has also been trained which maps the latent representations back to pixel space.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Emergent capabilities<\/h2>\n\n\n\n<p>On the surface, one may assume that Sora is simply mapping pixels from one frame to the next. In a sense it is doing this, and much more. <\/p>\n\n\n\n<p>In the example video of the two ships in a coffee cup, notice how naturally the waves form and the ships move in a motion that seems intuitive. This shows that Sora has an understanding of <strong>real world physics,<\/strong> which traditionally where hand coded using physics equations.<\/p>\n\n\n\n<p><strong>Coherence <\/strong>and <strong>permanence<\/strong> are also part of Sora&#8217;s achievments. When producing a shot with a moving camera, ojects do not simply disapper when the come back in frame or randomly change position.<\/p>\n\n\n\n<p>Combining these capabilities, not only is it conceivalbe that Sora lays the groundwork for blockbuster movies, but may also be used to created virtual game worlds.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\nhttps:\/\/www.youtube.com\/watch?v=CJbyjlOeBQY\n<\/div><figcaption class=\"wp-element-caption\">Minecraft world created by Sora<\/figcaption><\/figure>\n\n\n\n<p>Although Sora is currently only open to a few people at the moment, and can still struggles with occasional glitches, it is exciting to see where we will be a few iterations down the line. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you are not already aware, Sora is the latest in a series of groundbreaking models released by OpenAI. Which can create photorealistic short videos from a text prompt. Here&#8217;s an example it produced of two ships sailing in a storm in a coffee cup: How it works The OpenAi research paper Video generation models [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":64,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-container-style":"default","site-container-layout":"default","site-sidebar-layout":"default","disable-article-header":"default","disable-site-header":"default","disable-site-footer":"default","disable-content-area-spacing":"default","footnotes":""},"categories":[1],"tags":[],"class_list":["post-61","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"_links":{"self":[{"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=\/wp\/v2\/posts\/61","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=61"}],"version-history":[{"count":1,"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=\/wp\/v2\/posts\/61\/revisions"}],"predecessor-version":[{"id":63,"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=\/wp\/v2\/posts\/61\/revisions\/63"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=\/wp\/v2\/media\/64"}],"wp:attachment":[{"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=61"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=61"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/artificial-intelligence.news\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=61"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}