Implementing Memory in LLM Applications Using LangChain
Implementing Memory in LLM Applications Using LangChain
https://www.codecademy.com/article/implementing-memory-in-llm-applications-using-lang-chain
老版本
https://python.langchain.com/v0.1/docs/modules/memory/types/buffer/
How to migrate to LangGraph memory
https://python.langchain.com/docs/versions/migrating_memory/
What is Memory in LangChain?
In LangChain, memory is implemented by passing information from the chat history along with the query as part of the prompt. LangChain provides us with different modules we can use to implement memory.
Based on the implementation and functionality, we have the following memory types in LangChain.
- Conversation Buffer Memory: This memory stores all the messages in the conversation history.
- Conversation Buffer Window Memory: The conversation buffer window memory stores the
k
most recent interactions of the conversation history. We can specifyk
according to our needs.- Entity: This type of memory remembers facts about entities, such as people, places, objects, and others, in the conversation. It extracts information about entities and builds its knowledge as the conversation progresses.
- Conversation Summary Memory: As the name suggests, conversation summary memory summarizes the conversation and stores the current summary. This memory is helpful for longer conversations and saves costs by minimizing the number of tokens used in the conversation.
- Conversation summary buffer memory: The conversation summary buffer memory combines the Conversation Summary Memory and Conversation Buffer Window Memory. It stores the last
k
messages of the conversation and a summary of the previous messages.How to Implement Memory in LangChain?
To implement memory in LangChain, we need to store and use previous conversations while answering a new query.
For this, we will first implement a conversation buffer memory that stores the previous interactions. Next, we will create a prompt template that we can use to pass the messages stored in the memory to the LLM application, while running the LLM application for new queries.
Also, we will use an LLM chain to run the queries using the memory, prompt template, and the LLM object, as shown below:
import os from langchain.chains import LLMChain from langchain.memory import ConversationBufferMemory from langchain_core.prompts import HumanMessagePromptTemplate, ChatPromptTemplate, MessagesPlaceholder from langchain_google_genai import ChatGoogleGenerativeAI os.environ['GOOGLE_API_KEY'] = "YOUR_API_KEY" llm = ChatGoogleGenerativeAI(model="gemini-pro") first_prompt="Who is elon musk? Answer in 1 sentence." second_prompt="When was he born?" memory = ConversationBufferMemory(memory_key="chat_history") prompt = ChatPromptTemplate( messages=[ MessagesPlaceholder(variable_name="chat_history"), HumanMessagePromptTemplate.from_template("{query}") ] ) conversation_chain = LLMChain( llm=llm, prompt=prompt_template, memory=memory ) first_output=conversation_chain.run({"query":first_prompt}) second_output=conversation_chain.run({"query":second_prompt}) print("The first prompt is:",first_prompt) print("The second prompt is:",second_prompt) print("The output for the first prompt is:") print(first_output) print("The output for the second prompt is:") print(second_output)
Memory
https://langchain-ai.github.io/langgraph/concepts/memory/
How to add memory to chatbots
https://python.langchain.com/docs/how_to/chatbots_memory/
from langchain_core.messages import HumanMessage, RemoveMessage from langgraph.checkpoint.memory import MemorySaver from langgraph.graph import START, MessagesState, StateGraph workflow = StateGraph(state_schema=MessagesState) # Define the function that calls the model def call_model(state: MessagesState): system_prompt = ( "You are a helpful assistant. " "Answer all questions to the best of your ability. " "The provided chat history includes a summary of the earlier conversation." ) system_message = SystemMessage(content=system_prompt) message_history = state["messages"][:-1] # exclude the most recent user input # Summarize the messages if the chat history reaches a certain size if len(message_history) >= 4: last_human_message = state["messages"][-1] # Invoke the model to generate conversation summary summary_prompt = ( "Distill the above chat messages into a single summary message. " "Include as many specific details as you can." ) summary_message = model.invoke( message_history + [HumanMessage(content=summary_prompt)] ) # Delete messages that we no longer want to show up delete_messages = [RemoveMessage(id=m.id) for m in state["messages"]] # Re-add user message human_message = HumanMessage(content=last_human_message.content) # Call the model with summary & response response = model.invoke([system_message, summary_message, human_message]) message_updates = [summary_message, human_message, response] + delete_messages else: message_updates = model.invoke([system_message] + state["messages"]) return {"messages": message_updates} # Define the node and edge workflow.add_node("model", call_model) workflow.add_edge(START, "model") # Add simple in-memory checkpointer memory = MemorySaver() app = workflow.compile(checkpointer=memory)
LangMem
https://langchain-ai.github.io/long-term-memory/
import uuid from langmem import AsyncClient client = AsyncClient() user_id = str(uuid.uuid4()) thread_id = str(uuid.uuid4()) messages = [ { "role": "user", "content": "Hi, I love playing basketball!", "metadata": {"user_id": user_id}, }, { "role": "assistant", "content": "That's great! Basketball is a fun sport. Do you have a favorite player?", }, { "role": "user", "content": "Yeah, Steph Curry is amazing!", "metadata": {"user_id": user_id}, }, ] await client.add_messages(thread_id=thread_id, messages=messages) await client.trigger_all_for_thread(thread_id=thread_id) import anthropic from langsmith import traceable anthropic_client = anthropic.AsyncAnthropic() @traceable(name="Claude", run_type="llm") async def completion(messages: list, model: str = "claude-3-haiku-20240307"): system_prompt = messages[0]["content"] msgs = [] for m in messages[1:]: msgs.append({k: v for k, v in m.items() if k != "metadata"}) response = await anthropic_client.messages.create( model=model, system=system_prompt, max_tokens=1024, messages=msgs, ) return response async def completion_with_memory(messages, user_id): memories = await client.query_user_memory( user_id=user_id, text=messages[-1]["content"], ) facts = "\n".join([mem["text"] for mem in memories["memories"]]) system_prompt = { "role": "system", "content": "Here are some things you know" f" about the user:\n\n{facts}", } return await completion([system_prompt] + messages) new_messages = [ { "role": "user", "content": "Do you remember who my favorite basketball player is?", "metadata": {"user_id": user_id}, } ] response = await completion_with_memory(new_messages, user_id=user_id) print(response.content[0].text)