Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI. learn more
We now have a large language model (LLM) has become an everyday word around the world, like ChatGPT and Claude. many people are starting to worry that AI is coming for work.It is therefore ironic to see almost all LLM-based systems struggling with the simple task of counting the number of ‘r’s in the word ‘strawberry’. They aren’t just bad at the letter “r”. Other examples include counting the “m” in “mammal” and counting the “p” in “hippo.” This article analyzes the causes of these failures and provides simple workarounds.
LLM is a powerful AI system trained on vast amounts of text to understand and generate human-like language. They excel at tasks such as answering questions, translating languages, summarizing content, and even generating creative writing by predicting and constructing coherent responses based on the input they receive . LLM is designed to recognize patterns in text and can handle a wide range of language-related tasks with incredible accuracy.
Despite LLM’s superior abilities, her inability to count the number of “r’s” in the word “strawberry” is a reminder that LLM is not capable of “thinking” like humans. They don’t process the information we give them the same way humans do.
Almost all modern high-performance LLMs are built on: transformer. This deep learning architecture does not directly take text as input. They use a process called tokenizationconverts text into a numerical representation, i.e. tokens. Some tokens may be whole words (such as “monkey”), while others may be parts of words (such as “mon” or “key”). Each token is like code that the model can understand. By breaking everything down into tokens, the model can more accurately predict the next token in the sentence.
LLM does not memorize words. They are trying to understand how these tokens fit together in different ways and are good at guessing what will happen next. For the word “hippo,” the model recognizes the letter tokens “hip,” “pop,” “oh,” and “tums,” and knows that the word “hippo” is made of the letters “.” There may not be. h”, “i”, “p”, “p”, “o”, “p”, “o”, “t”, “a”, “m”, “u”, “s”.
Model architectures where individual characters can be examined directly without tokenization might avoid this problem, but with today’s transformer architectures it is not computationally feasible.
Additionally, if we look at how LLM generates the output text, it looks like this: predict The next word is determined based on the previous input and output tokens. This works for producing context-aware, human-like text, but it’s not suitable for simple tasks like counting letters. If you are asked to answer the number of ‘r’s in the word ‘strawberry’, the LLM will predict the answer based purely on the structure of the input sentence.
The workaround is as follows
LLMs may not be able to “think” or reason logically, but they are adept at understanding structured text. A great example of structured text is computer code in many programming languages. If you ask ChatGPT to count the number of ‘r’s in ‘strawberry’ using Python, it will most likely give you the correct answer. When the LLM needs to perform counting or other tasks that require logical reasoning or arithmetic calculations, it includes prompts that ask the LLM to use a programming language to process the input query. You can design a broader range of software.
conclusion
A simple character count experiment reveals the fundamental limitations of LLMs such as ChatGPT and Claude. Despite their impressive ability to generate human-like text, write code, and answer questions posed to them, these AI models still cannot “think” like humans. This experiment shows what a model is: a pattern-matching predictive algorithm rather than an “intelligence” capable of understanding and reasoning. However, knowing in advance what types of prompts work well can alleviate some of the problem. As AI becomes increasingly integrated into our lives, it is important to recognize the limitations of these models for responsible use and realistic expectations.
Chinmay Jog is a Senior Machine Learning Engineer. Pangiam.
data decision maker
Welcome to the VentureBeat community!
DataDecisionMakers is a place where experts, including technologists who work with data, can share data-related insights and innovations.
If you want to read about cutting-edge ideas, updates, best practices, and the future of data and data technology, join DataDecisionMakers.
You may also consider contributing your own article.
Read more about DataDecisionMakers