ChatGPT Strawberry launches: o1 model

The long awaited ChatGPT model, codenamed Strawberry, has officialy launched as the o1 model. As usual, the results are ground breaking and take us a step closer to the ultimate goal of creating an AGI (Artificial General Intelligence).

Chain of thought

Previous ChatGPT models were prone to forget what had been mentioned earlier in the conversation, even by themselves. This has been addressed in the new o1 models with the chain of thought capability. Simply put, they are capable of delivering step by step answers following a logical chain of reasoning.

This has allowed it to surpass PhD level humans at answering science questions, and far surpassed ChatGPT 4 at competitive math and code questions.

o1 performance compared to 4o. Solid bars are pass@1 accuracy, shaded bars are the consensus of 64 sample answers.
openai.com/index/learning-to-reason-wtih-llms

Chain of thought does come with a slight downside, where the model will take a few seconds, typically around 60, to formulate an answer before responding.

ChatGPT 4o answering a simple physics question.
ChatGPT 1o-preview answering the same physics question.

You can see the difference between the two models in the GIFs above, where both models are asked a question about a ball being dropped from a plane.

4o starts responding immediately, whereas 1o-preview takes 51 seconds to think, and goes through steps reasoning about the solution. As a result, 1o-preview is able to provide a more detailed answer, which not only explains the steps in more detail, but also gives a more nuanced answer taking into consideration air resistance.

Hiding the Chain of Thought

One interesting point made by OpenAI, is that the chain of thought can be used to monitor what the model is “thinking”. This was previously a blind spot in LLM’s, since no one is capable of fully understanding their inner-workings, it was not possible to see how they arrived at an answer.

OpenAI decided to not fully reveal the chain of thought to users. The reasons they give include maintaing a competitive edge and giving the model freedom of thought. This comes into play with OpenAI’s AI safety agenda. Where they wish to monitor the AI for signs of manipulating the user, something that a policy restricting the chain of thought can hinder.

Q*

An early codename for the o1 model was Q*. Which caused controversy with the OpenAI board firing the CEO, Sam Altman, for apparently misleading them on it’s capabilities as an AGI. Then with the backing of Microsoft, Sam returned as the CEO and much of the original board member were forced to resign.

An interesting point about Q* is that it gave an insight into how the new model was trained. This references the Q-learning reinforcement learning algorithm. A type of machine learning which involves setting a goal and giving an AI agent free reign to learn the rules for achieving it. This allows the agent an internal representation of how the world operates. This algorithm has been used to great effect in the past for teaching AI agents how to play games.

An AI agent playing Super Mario, after being trained with reinforcement learning.

OpenAI is not the only organisation that has had success combining reinforcement learning (RL) with LLM’s. Google recently released a paper of their GameNGen model, which generated a playable version of Doom. This was achieved by teaching an agent how to play Doom using RL and then use that to create training data for a Stable Diffusion v1.4 model which renders the game.

A playable version of Doom generated by Google’s GameNGen diffusion model.

Conclusion

Chain of thought reasoning is the next logical step in achieving an AGI. Since it helps the model to achieving “common sense”, which has eluded the LLMs so far.

Other teams have attempted to fill this gap in the past, such as with Devin – The AI developer. But with this new offering from OpenAI providing similar functionality, it lowers the barrier of entry for other teams to create something similar.

As a result, this is likely to have an impact on knowledge based industries such as software engineering by boosting productivity.