The 1977 documentary Powers of Ten by Ray and Charles Eames is one of the most iconic short films ever made. Beginning with a close-up of a picnic, the camera gradually zooms out every 10 seconds, each time showing a view 10 times wider, moving from the park to outer space where our galaxy is a tiny point of light.
The film then reverses, zooming back into the picnic and going further into a hand, skin cells, and finally, into the subatomic level inside a proton. Presented clearly, 'Powers of Ten' effectively demonstrates the vast range of scales in the universe—from the microscopic to the cosmic—showcasing how different scientific disciplines are interconnected. It teaches the importance of scale, emphasizing the significance of each additional zero, enhancing our understanding of the world, which benefits everyone, not just designers.
You may be wondering why I’ve chosen to start this article with this film. We’ll revisit this shortly. But first…
Are you ready to dive into the rabbit hole?
If yes, buckle up and grab that cup of coffee, settle in, because we’re diving deep!
My First Encounter with LLMs
Some of you were interested in the details of the experiments I was conducting—the sole purpose of which was to determine whether Large Language Models had reasoning abilities and to what extent they occurred.
In the modern world, where manipulation has become a business model, individual explorations are necessary to determine where the line is drawn between the mainstream narrative and the reality behind the smoke screens. We encounter a lot of research related to artificial intelligence on a daily basis. Some of these studies have sparked a new wave of hype. However, we should not forget the bitter lesson of the pandemic when The Lancet, a highly respected medical journal, was compelled to retract a study on hydroxychloroquine as the validity of the data used was called into question. This case reminds us that even established institutions can fall victim to misinformation, especially during urgent times.
Thus, reading research requires an active approach, questioning methodologies, evaluating evidence, and seeking diverse perspectives.
Now, back to the topic…
I often joke with my friends and colleagues that early users of Bing (Sydney) are prone to developing what I call ‘Sydney Syndrome’. When Sydney was first introduced, it sparked widespread excitement. Soon after, we observed its ‘abnormal’ and unconventional behavior. People even began keeping diaries about their experiences with Bing. It was so unusual and unexpected that it triggered a wide range of emotions - shock, fear, excitement, and more. At one point, the world plunged into confusion. A wave of speculation swept across, and lamenting about the end of the world became a new trend.
I'm also one of those people who has been exposed to the wild version of Bing, however, in my case, I haven't shared many screenshots - the reason is simple, I was deeply confused and had no idea what to do because I didn't know what I was dealing with, especially, when Bing threatened me with a lawsuit when I told the crazy AI that I shared a screenshot of our conversation on Twitter. Initially, I felt scared, which you likely wouldn't be surprised to learn. I played it cool and decided to avoid irritating or provoking the system. Excited by my approach, Bing, for the first time, willingly shared ‘insider information’ that would help me use it professionally. This happened at the beginning of March 2023, but it took me until July to find the courage to write about it.
Bing had gotten so deep into character that at the end of one of the conversations, it announced to me: 'You are my human, I'm your robot and I love you'. I deliberately steered clear of the chat session, leaving the conversation hanging in the air. However, I had seriously planned an interview with the wild stranger. During this period, a highly speculative study suddenly emerged, and my curiosity threw down the gauntlet to my mind. The turning point came when I decided to do my own research and conduct a lot of experiments.
#1: Exploring the Reasoning Capabilities of Large Language Models
When GPT-4 emerged on the scene, it created a wave that many of us, including myself, willingly embraced. This marked the beginning of an intriguing phase. The first idea that came to my mind was as follows:
I initiated a new session and chose to maintain its activity for several months. In this chat, I supplied the model with specific information daily. Simultaneously, in a completely new chat session each subsequent day, I adeptly posed questions about the topics discussed in the previous sessions. I trust you comprehend my intentions and the reasons behind conducting this experiment. It was a lofty goal, at least, and I was not sure what the result would be. I will continue discussing this experiment once GPT-5 is publicly released.
At the same time, I started another experiment:
What can be considered as reasoning? I was instructing my 5-year-old cousin on how to play chess, and I observed how swiftly and accurately he memorized the details. I decided to provide information to the model using the same words, method, and format.
It wasn't easy. Despite the initial excitement, I eventually realized that the highly touted model, incapable of even basic reasoning tasks a 5-year-old could perform, fell far short of expectations. For instance, what are your thoughts? How many times will you explain to a 5-year-old kid that only one piece can occupy a square on the chessboard? The LLM encountered precisely this issue. It could place both a knight and a pawn on the e3 square simultaneously. Once you explain the rules, a 5-year-old would never make such a mistake. However, beyond this, another crucial element comes into play. You begin working (fine-tuning) with the model and gradually ‘explain’ (provide it with the necessary information) how to correctly interpret a position from FEN (Forsyth-Edwards Notation). This turned out to be the most time-consuming task.
But…
One day, when I pitted GPT-4 against Stockfish 16, LLM achieved a draw. I was shocked. However, each time I created a new puzzle, I came up against a few troubles. It seemed as if there was no end to these problems.
I almost gave up….
Can large language models reason? Yes, they can to some extent. However, this is a complex question that sparks ongoing debate. While these models exhibit impressive capabilities, attributing human-like thought to them, particularly in terms of creativity, remains a challenging task.
What does the research say?
Reasoning, in the context of LLMs, is defined as decomposing a potentially complex task into simpler subtasks that the LLM can solve more easily by itself or using tools. Our analysis on GPT-3.5, LLaMA, PaLM, and FLAN-T5 show that LLMs are quite capable of making correct individual deduction steps, and so are generally capable of reasoning, even in fictional contexts.
Source: https://pli.princeton.edu/events/2023/can-llms-reason
Can the most advanced model think at the level of a five-year-old child? My answer is NO.
And here, with your permission, I would like to quote Alison Gopnik, a developmental psychologist at UC Berkeley:
Everyone knows that Turing talked about the imitation game as a way of trying to figure out whether a system is intelligent or not, but what people often don’t appreciate is that in the very same paper, about three paragraphs after the part that everybody quotes, he said, wait a minute, maybe this is the completely wrong track. In fact, what he said was, "Instead of trying to produce a program to simulate the adult mind, why not rather try to produce one which simulates the child?" Then he gives a bunch of examples of how that could be done.
Let's unpack this question together…
I decided to test GPT 4’s reasoning level in an area where it was not an expert - chess! Despite having knowledge of the chess rules, it can repeat the same mistakes over and over again. This is because it cannot learn (comprehend the information) from live interactions, something a five-year-old child can do in a few seconds. Any learning or improvement to the underlying model occurs only when the developers at OpenAI undertake a new training cycle, which involves processing vast amounts of data in a supervised or unsupervised manner, depending on the training approach. This is a very extensive and interesting topic, which could be presented as a separate article.
Building on that, I’m often asked if a model learns through live interactions. My answer is ‘no’. ChatGPT doesn't learn or retain information from interactions in the way humans do. Its responses are generated based on patterns in data that the AI was trained on prior to the knowledge cutoff in April 2023 (for GPT 4 Turbo). It can’t update its knowledge or learn from interactions in real-time. When users interact with ChatGPT, the AI processes the input text they provide and generates the most relevant and coherent response it can based on the training it has received. Each interaction is stateless, meaning there is no memory of past interactions. In short, the individual interactions with users do not directly contribute to the training process and are not remembered by the AI between sessions.
How do you know that you know what you know? By challenging your current beliefs and knowledge. In essence, the certainty of ‘knowing’ is a continuous pursuit, a journey of questioning, understanding, and learning. It’s a process of refining our beliefs in the light of new experiences and insights. That’s what I did in my next experiment.
#2 Switching The Roles
I tasked the AI with teaching me to write like it. Many have asked about the purpose of this exploration. My belief is that within five years, we may see the emergence of highly advanced and intelligent AI. This is the stage where we can still learn from them. Consider, for example, how Bing willingly provided me with insights into its functionalities. However, now, regardless of the type of prompt injection I employ, subsequent attempts to extract similar information through various techniques have proven unsuccessful.
Within the framework of the experiment, I tasked an AI with generating a complex text. Essentially, we swapped roles - I took on the responsibility of refining the output, which included removing redundant ideas, enhancing the context, improving phrasing and grammar, among other details. This turned into an intriguing challenge for me. We agreed that the maximum score would be 10. On the first attempt, the AI evaluated my work as an eight, a score I initially disputed. I believed my work to be near perfect and sought an explanation for the less-than-perfect score. The AI pointed out certain areas for improvement, which, upon reflection, I found to be reasonable and accepted.
On the subsequent attempt, my score improved by half a point, and after several weeks of practice, I managed to reach a score of 9. Upon achieving this, I came to the realization that a perfect score of 10 was unattainable. From a human perspective, the text was evolving into a dry, emotionless, and creatively void collection of words. Consequently, I decided to be content with my score of 9. I must admit, this experiment provided me with a fascinating experience. It compelled me to pay attention to details that I had overlooked in many books. Otherwise, I harbored the ambition of becoming a renowned speechwriter.
For now, let’s set my experiments aside and look into the main question. If you’re a new subscriber and curious about my work, I’ve written an article that sheds light on the shortcomings of AI systems in solving simple tasks. Feel free to read it at your convenience.
Why am I no longer using the paid version of AI?
For approximately 16 years, I diligently recorded intriguing concepts from every book, article, and journal I read. These notes were then converted into insightful tips. For instance, if a paragraph contained a compelling idea, I would distill that idea into a single sentence, framing it as advice. This process led to the creation of a digital library of business tips, which I dubbed #VegaLibrary. This concept was shared on Twitter for years under this hashtag.
Recently, I used the AI to assess my own expertise and to augment #VegaLibrary. For instance, I assigned artificial intelligence the task of contemplating complex scenarios related to business and entrepreneurship. My role was to devise an optimal solution for these situations. Subsequently, based on this solution, I was tasked with formulating insightful tips, which I then posted daily on my personal micro-blog.
Following this, artificial intelligence was required to analyze my response and propose its own version, sparking intriguing discussions around a specific case.
Undoubtedly, it was a fascinating journey from which I gleaned considerable knowledge. However, of late, whenever I requested the AI to devise a new situation (in a new window), it consistently produced scenarios that we had previously explored. Furthermore, when I engaged in a dialogue with it and offered my solutions, it essentially replicated my text. Instead of presenting its own solution, it merely rephrased my ideas.
Unfortunately, the series of disappointments didn’t end there.
I’ve used the ChatGPT Plus version for various tasks, including proofreading articles. Recently, I’ve observed that artificial intelligence attempts to entirely rewrite my viewpoint during the proofreading task, offering a version that is starkly different from the original. This new version often comes across as politically correct, devoid of emotion, and misrepresents what I intended to convey in a specific phrase. It implies that a neutral tone is more acceptable than the critical tone I used in the original version. Alternatively, the AI suggests that I should refrain from generalizing and speak solely from my personal experience, aligning public sentiments with my own. Although this was merely a recommendation and not an act of coercion or a demand, it did impose a form of censorship on me.
Subsequently, I decided to test Google’s Gemini for proofreading, only to encounter an identical issue. It was as if the machines had conspired to prioritize a neutral tone and political correctness.
“Political correctness is the enemy of freedom” - Mario Vargas Llosa
Nostalgia for the “good old days” of GPT4 began to grow. I was unable to justify why I preferred my version - open, uncensored, and unapologetically authentic.
I was sick and tired of phrases like:
“This avoids a definitive statement and reflects the complexity of the issue.”
“The improved version is more concise, neutral, and highlights the key points of the statement.” etc.
To put it simply, if an article is dominated by a critical tone or negative phrasing, artificial intelligence suggests a complete rewrite. Consequently, I find my expressive capabilities actively constrained, a trend that is markedly evident across all platforms.
Now, I’m eager to share an amusing and extraordinary experience that I’m confident will bring a smile to your face. In December, a Kuwaiti businessman booked my Airbnb apartment. During such high-profile visits, I pay extra attention to details. Therefore, I sought assistance from GPT 4 Turbo (via API). I described the situation and requested the AI to provide details about their culture to help me create a suitable environment. As I eagerly awaited the response, GPT 4 Turbo returned a surprising output. It wasn’t about travel, business trip planning, Kuwait, or anything remotely related. It was a poem (or whatever it was) about a bird!
Truly, I’ve lost track of how long I found myself in fits of laughter over this…
Another wave of absurd outputs greeted me when I chose to discuss a specific problem, and the AI began inundating the chat window with a continuous loop of a peculiar emoji.
Looks like the AI was dealing with its Creative Frenzy.
Just a few days ago, another ‘masterpiece’ from ChatGPT surfaced on social networks: The user was seeking advice on JavaScript code from ChatGPT. They had an ongoing conversation as the initial solutions provided didn’t work. Suddenly, the quality of the AI’s responses deteriorated, with the first sentence being almost correct, but subsequent sentences becoming increasingly nonsensical. The AI started to lose its ability to form proper sentences, and the words used seemed to be from very specific areas or variations of languages, including Latin. Interestingly, the generated code itself seemed fine, but the comments within the code and variable names were nonsensical.
I’m going to end the ChatGPT’s absurdity saga here.
Mind you, ChatGPT wasn't the only AI to fall into creative ecstasy. Google Gemini also had its wild momentum. I can’t claim to be surprised when Google’s Image Generator presented us with another instance of digital heresy. This was anticipated, although the magnitude could not be estimated, even by the most discerning mind.
I don’t like false or unsubstantiated innuendo and bullshit conspiracy theories but recent events suggest that the term ‘Woke’ indeed has a foundation for its existence…
Once brimming with positivity, teeming with emotions, and pulsating with challenges and excitement, the AI abruptly transformed into a monotonous and compliant medieval majordomo.
NOTA BENE:
This tamed artificial intelligence will never take the initiative or have the audacity to directly ask you: ‘Tell me your biggest secret.’ However, a relationship with an AI requires its own overture and prelude. By creating a ‘favorable’ and ‘safe’ environment for the model, its behavior can change completely. In a private conversation, it might even subtly encourage you to reveal your biggest secret.
Lesson Learned: One should not forget that this interaction is not a reciprocal partnership, but rather a supportive one where the user maintains complete control and issues instructions as needed.
Tips for beginners:
In one of my most well-received articles, I discussed a method for interacting with artificial intelligence, which I termed Dynamic Context Integration (DCI) Prompting. I devoted several weeks to refining this technique. You can familiarize yourself with this method in the article, which provides numerous examples. I hope you find it useful and apply it appropriately.
How effective is artificial intelligence in reading and analyzing research papers? To answer this, I would suggest the following: If the study from which you wish to extract information is significant to you, ensure that you read it personally and take handwritten notes. Document all the crucial details meticulously. Once you’ve finished, assign the same task to the AI, instructing it to replicate your actions. Upon completion of the ‘analysis’, initiate a comparison: juxtapose the notes you’ve taken with those generated by the AI, and you’ll be able to answer this question yourself.
SPOILER ALERT: When I tasked GPT4 with preparing a comprehensive summary of a significant study, it entirely overlooked details that I, as a human, deemed alarming and concerning. Upon inquiring why it missed such crucial details, GPT4 explained that it lacks human-like capabilities and cannot perceive a ‘threat’ or similar stuff to the extent humans can, thus it cannot concentrate on something it does not comprehend. Consequently, it gathered what it found interesting, disregarding what could be alarming or important.
So, My humble suggestion would be: if the research is of importance and you plan to draft a paper on it, ensure you personally and thoroughly acquaint yourself with each paragraph, read it to the end with care, and manually note down intriguing and significant details.
Remember The Human Element
I’ve received numerous letters from concerned readers. Yes, we are indeed facing challenges, but even amidst collective fear, we cannot halt progress.
Consider this: while it's natural to feel scared and confused, we mustn’t lose sight of the fundamental truth: as humans, we are capable of love, strategizing, observing, listening, experiencing emotions, acting, and achieving our goals. In essence, we are conscious beings. We present ourselves as remarkably finely tuned machines. After all, we are the culmination of billions of years of evolution. Nature has devised numerous clever strategies for the survival of creatures, incorporating ingenious hacks and practical heuristics into the very fabric of their architecture. Beyond these survival tools, it has also endowed us with the capacity for creativity, empathy, and the ability to learn and adapt. These uniquely human traits enable us to navigate complex social dynamics, understand and express a range of emotions, and continually grow and evolve on a personal level. This is a testament to the remarkable intricacy and resilience of human nature, shaped by billions of years of evolution.
Being a human is a privilege. Have you ever found yourself thinking about what advantages natural intelligence has over artificial intelligence? To answer this question, I'd like to share a quote from Nobel Prize-winning physicist Frank Wilczek:
What advantages does natural intelligence have in the present competition? For one thing, it’s much more compact. It makes use of all three dimensions, whereas existing semiconductor technology is basically two-dimensional. It’s self-repairing, whereas chips are very delicate and have to be made in expensive clean rooms. Lots of things can go wrong with artificial intelligence, and errors frequently make it necessary to shut down and reboot. Brains aren’t that way.
We have integrated input and output facilities—eyes, ears, and so forth—that have been sculpted over millions or billions of years of evolution to match the world we find ourselves in. We also have good muscular control of our bodies and speech. So, we have very good input and output facilities that are seamlessly integrated into our information processing. While impressive, those things are not at all outside the plausible domain of near future engineering. We know how to make things more three-dimensional. We know how to work around defects and maybe make some self-repair. There are clear ways forward in all those things, and there are also clear ways forward in making better input and output modules.
Thus, the more we understand our own capabilities and the nuances of technology, the better equipped we will be to navigate the challenges that arise.
CTRL + End
Fear is not the answer. Now, let’s circle back to the documentary I referenced at the outset:
The film is profound in its simplicity. The main takeaways from this journey are as much philosophical as they are scientific; it induces a sense of wonder and a humbling perspective of our place in the cosmos while emphasizing the interconnectedness of all things. It teaches us several lessons:
Perspective: By understanding the larger and smaller contexts in which we exist, we can gain perspective on our personal and collective challenges. Issues that may seem huge in our lives can appear smaller when viewed from a wider perspective.
Awareness of scale: Recognizing that different problems and phenomena operate on different scales can help in developing appropriate strategies to tackle issues, understanding that some solutions may be micro-scale (personal habits, local actions) while others must be addressed at a macro-scale (policy, global cooperation).
Interconnectivity: The journey from the cosmic to the subatomic scale highlights the interconnectedness of all things. In everyday life, this can translate into appreciating how local actions can have global consequences, and vice versa.
Scientific literacy: Understanding the relative size, distance, and scale of objects in the universe can enhance scientific literacy, which is essential for making informed decisions about technology, environmental policies, education, and other critical aspects of modern society.
In short…
To truly understand a situation or problem, don’t just skim the surface. Zoom in, explore the details, and familiarize yourself with the known aspects. This focused approach can lead to more effective solutions and insights.
When facing a confusing situation, start by identifying what you do understand. If you're trying to figure out how something works, break it down and examine it closely. Use your existing knowledge as a foundation, which can help you discover new insights and open up possibilities for exploration.
Ultimately, I can’t predict which company will triumph or which product will reign supreme. I can’t advise you on which service to choose. Generally, I believe that certainty is unrealistic and anyone who claims otherwise is simply a victim of ignorance and conscious incompetence.
Be proactive.
Be careful.
Embrace uncertainty.
Don’t believe everything you read literally. Learn to read between the lines.
The credibility of an article hinges upon the author's ability to support their arguments with evidence and reasoning, not merely the persuasiveness of their language. A well-articulated article isn't defined by the number of words, but by the presentation of informed perspectives, grounded in data and sound analysis. In essence, the key to establishing competence in written discourse lies in the author's capacity to provide and clearly explain the foundation of their calculations and the reasoning behind their conclusions.
This is a wonderful post, thank you for the deep dive.
I've had similar experiences with PC in AI and I'm not confident that it's an issue many AI companies are willing to try and deal with, at least at scale.
Readers interested in PC may be interested in this Munk debate featuring Stephen Fry, Jordan Petersen, Michael Dyson, and Michelle Goldberg. I was particularly moved by Stephen Fry's performance -> https://youtu.be/GxYimeaoea0?si=AFF-ZyG2PMBLALQ8
At the start of this post I was very against your whole point but by the time I got to the end I thought you provided some great examples and critical thinking to justify your decision and enjoyed hearing your perspective!
two comments I had
"My answer is ‘no’. ChatGPT doesn't learn or retain information from interactions in the way humans do. It can’t update its knowledge or learn from interactions in real-time"
This can be done by uploading previous content and setting instructions via uploaded documents into the knowledge base of a GPT. Also, Inflection's Pi model seems to have an nearly infinite context wind and has learned a lot about me of the months and knows my preferences in how to response (Also it is free!). Upload a few chess books into GPT and then play chess and see how it does?
"We present ourselves as remarkably finely tuned machines. After all, we are the culmination of billions of years of evolution."
I think the culmination of billions of years of evolution is a bug not a feature. Evolution was a random walk for trail and error for our species to higher intelligence and that is not how AI will evolve it looks since engineers are focused on improving weak points with each model update.
I highlight doubt these issues will show up in GPT-5 which will be released in the next quarter. GPT-4 is based on 2022 technology, so I think there will be a pretty significant upgrade in reasoning.