What is StreamingLLM, and how does it address the short-term memory limitations of AI chatbots?

StreamingLLM is a method developed by scientists at MIT to enhance the short-term memory of chatbots. It allows them to retain the initial tokens of a conversation, improving performance even beyond 4 million words.

How does OpenAI plan to implement and expand the memory features in ChatGPT?

OpenAI has initially introduced the new memory features to a limited group of ChatGPT users, with a broader rollout planned for all users in the near future.

What are the practical applications of ChatGPT's memory feature as exemplified by OpenAI?

OpenAI provided examples showcasing how ChatGPT can remember specific information, such as a teacher's preferences for lesson plans or a toddler's love for jellyfish, to enhance user interactions.

How does StreamingLLM contribute to making chatbots 22 times faster than alternative short-term memory methods?

StreamingLLM maintains the original encoding of tokens, allowing chatbots to preserve the initial four tokens and significantly increase their speed compared to other methods.

How does OpenAI plan to address biases and ensure the responsible use of ChatGPT's memory, especially concerning sensitive information?

OpenAI is taking steps to assess and mitigate biases, and ChatGPT won't remember sensitive information like health details unless explicitly requested by the user.

What options do users have to control memory access in ChatGPT?

Users with memory access can utilize a "temporary chat" mode, where the memory is entirely deactivated, providing greater control over the information retained by the chatbot.

February 25, 2024

Revolutionizing AI Chatbots: Scientists Tackle Memory Challenges

by The Daily Frontier

Discover the latest in AI chatbot advancements! Scientists may have resolved the critical short-term memory problem, and OpenAI is now introducing long-term memory for ChatGPT. Explore the improvements today!

AI chatbots struggle with memory, often forgetting information between or within conversations. However, two recent breakthroughs hold the potential to revolutionize this limitation.

Engaging in extensive conversations with a large language model (LLM) such as OpenAI’s ChatGPT can lead to forgetting key information, especially beyond 4 million words of input. The model’s performance declines significantly during prolonged interactions.

Chatbots like ChatGPT begin to fail if you have a conversation that’s long enough, and haven’t yet been able to remember details between separate conversations. (Image credit: Eoneren via Getty Images)

Currently, ChatGPT and other Large Language Models (LLMs) lack the ability to retain information between conversations. If you conclude a conversation and restart ChatGPT a week later, the chatbot won’t recall any details from the previous exchange.

Excitingly, two independent teams may have cracked the code on memory issues in AI. MIT scientists identified why AI forgets mid-conversation and devised a potential fix.

Simultaneously, OpenAI developers are testing long-term memory, allowing users to instruct Chat GPT to remember, recall, and even forget specific parts of conversations, offering a more dynamic interaction.

Enhancing Performance During Conversations

Researchers discovered a method to enhance short-term memory in chatbots by modifying the key-value cache, responsible for storing and replacing text chunks (tokens). Termed “StreamingLLM,” their innovative approach is detailed in a paper published on Dec. 12, 2023, on the pre-print server arXiv.

Due to memory limitations, a chatbot replaces older tokens with newer ones during a conversation. However, implementing StreamingLLM in a Large Language Model (LLM) allows it to preserve the initial four tokens while discarding the fifth and onward. While the chatbot still forgets due to its limited memory, it retains recollection of the earliest interactions.

The sequence and labeling of tokens (e.g., first, second, third) are crucial as they contribute to an “attention map” for the ongoing conversation. This map illustrates the strength of relationships between each token and others in the conversation.

In the process of evicting tokens, like when the fifth token is replaced, StreamingLLM maintains the original encoding of tokens. Even if the sixth token follows, it retains its original encoding as the sixth token and doesn’t get re-encoded as the new “fifth” token just based on its position in line.

Tokens feed into an “attention map” for each conversation, with the AI chatbot forging links between tokens and determining their relevance to one another. (Image credit: Andriy Onufriyenko via Getty Images)

With these modifications, the chatbot maintains optimal performance even beyond 4 million words, according to the scientists’ findings. Additionally, it exhibits a remarkable 22 times increase in speed compared to an alternative short-term memory method that avoids performance crashes by continuously recomputing parts of the preceding conversation.

Study lead author Guangxuan Xiao, an electrical engineering and computer science graduate student at MIT, expressed, “Now, with this method, we can consistently deploy these large language models. Creating a chatbot that remains accessible for ongoing conversations allows for novel applications and continuous interaction based on recent discussions.”

StreamingLLM has been integrated into Nvidia’s open-source Large Language Model (LLM) optimization library, TensorRT-LLM. This library serves as a basis for developers building their own AI models.

The researchers are also working on enhancing StreamingLLM by enabling it to identify and reintegrate evicted tokens if they become necessary again.

ChatGPT never forgets important details

OpenAI is actively experimenting with enhancing ChatGPT’s long-term memory. This initiative aims to enable users to seamlessly continue conversations and establish a more substantial and interactive relationship with the AI chatbot.

During interactions with the Large Language Model (LLM), users have the option to instruct ChatGPT to remember specific details or grant it the autonomy to store relevant elements for future reference.

Notably, these memories are not tied to individual conversations, meaning deleting chats won’t erase memories; a separate interface is required to delete specific memories. Unless manually deleted, starting a new chat will preload ChatGPT with previously saved memories.

OpenAI illustrated the practical utility of this feature with examples. In one instance, the chatbot retains information that a kindergarten teacher, overseeing 25 students, prefers 50-minute lessons with subsequent activities, utilizing this data to assist in lesson plan creation.

In another case, ChatGPT remembers a user sharing their toddler’s love for jellyfish, and the AI tool recalls this detail while designing a birthday card for them.

The company has introduced the latest memory features to a select group of ChatGPT users, as stated by representatives on February 13. They have plans for a wider rollout to all users in the near future.

OpenAI intends to leverage information from memories to enhance its models, according to company representatives. They emphasize a commitment to evaluating and mitigating biases, ensuring that ChatGPT doesn’t remember sensitive information like health details unless explicitly requested by the user. Users with memory access also have the option to utilize a “temporary chat” mode where memory is entirely deactivated.

Revolutionizing AI Chatbots: Scientists Tackle Memory Challenges

Table of Contents

Enhancing Performance During Conversations

ChatGPT never forgets important details

Want to keep up with our blog?

Related Posts

The Basics of Solar Energy: Understanding Energy Conversion

Do Electrons Have an Infinite Lifespan?

10 Fascinating Insect Facts: From Sugar Cattle to Giant Sperm

10 Greenest Cities on Earth Leading the Way in Sustainability