04. Long context rearrangement (LongContextReorder)

Long context rearrangement (LongContextReorder)

Regardless of the model's architecture, performance will drop considerably if you include more than 10 searched documents.

Simply put, when a model needs access to relevant information in the middle of a long context, it tends to ignore the documents provided.

For more information, see the following paper

https://arxiv.org/abs/2307.03172

To avoid this problem, you can rearrange the order of documents after searching to prevent performance degradation.

Chroma The ability to store and retrieve text data using vector storage retriever Generate.
retriever of invoke Use methods to search for relevant documents for a given query.

# API A configuration file for managing keys as environment variables.
from dotenv import load_dotenv

# API Load key information
load_dotenv()

True

# LangSmith Set up tracking. https://smith.langchain.com
# !pip install langchain-teddynote
from langchain_teddynote import logging

# Enter a project name
.
logging.langsmith("CH11-Retriever")

 Start tracking LangSmith. 
[Project name] 
CH11-Retriever

from langchain_core.prompts import PromptTemplate
from langchain_community.document_transformers import LongContextReorder
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings


# Get the embedding.
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

texts = [
    "This is just something I wrote randomly.",
    "ChatGPT, an AI designed to converse with users, can answer a variety of questions.",
    "iPhone, iPad, MacBook, etc. are representative products released by Apple.",
    "ChatGPT was developed by OpenAI and is continuously being improved.",
    "ChatGPiT has learned a lot of data to understand users' questions and generate appropriate answers.",
    "Wearable devices like the Apple Watch and AirPods are also part of Apple’s popular product line.",
    "ChatGPT can also be used to solve complex problems or suggest creative ideas.",
    "Bitcoin is also called digital gold and is gaining popularity as a store of value.",
    "ChatGPT's features are continuously evolving through learning and updates.",
    "The FIFA World Cup, held every fourth year, is the biggest event in international football.",
]


# Create a search engine (K is set to 10)
retriever = Chroma.from_texts(texts, embedding=embeddings).as_retriever(
    search_kwargs={"k": 10}
)

Perform a search by entering a query in the finder.

query = "What can you tell me about hatGPT?"

# Get related documents sorted by relevance score.
docs = retriever.invoke(query)
docs

[Document (page_content='ChatGPT can also be used to solve complex problems or suggest creative ideas.'), Document (page_content='ChatGPT's features are further developed through continuous learning and updates.'), Document (page_content=' ChatG, an AI designed to talk to users Document (page_content='This is just what I wrote down.'), Document (page_content='Chatzipiti is a large amount of data to understand the user's questions and generate appropriate answers.'), Document (page_content=' Bitcoin is also called digital gold and is gaining popularity as a means of value storage'), Document (page_content='FIFA World Cup is held every fourth year and is the biggest event in international football.')]

LongContextReorder Instance of class reordering Generate.

reordering.transform_documents(docs) List of documents by calling docs Rearrange.
Less related documents are located in the middle of the list, and more related documents are rearranged to be located at the beginning and end.# 문서를 재정렬합니다

# Less relevant documents are placed in the middle of the list, and more relevant elements are placed at the beginning/end.
reordering = LongContextReorder()
reordered_docs = reordering.transform_documents(docs)

# Make sure there are 4 related documents at the beginning and end.
reordered_docs

[Document (page_content='ChatGPT's features are further developed through continuous learning and updates.'), Document (page_content='ChatGPT was developed by OpenAI and is constantly improving.'), Document (page_content='Chatzipiti has understood the user's question and generated a large amount of answersFIFA World Cup is held every fourth year, and is the biggest event in international football.'), Document (page_content=' Wearable devices like Apple Watch and Airpot also belong to Apple's popular product line.'), Document (page_content=' Bitcoin is also called digital gold, and it is gaining popularity as a means of storing value'), Document (pageChatGPT, an AI designed to talk to users, can answer a variety of questions.'), Document (page_content='ChatGPT can also be used to solve complex problems or suggest creative ideas.')]

Create an inquiry-response chain using Context Reordering

def format_docs(docs):
    return "\n".join([doc.page_content for i, doc in enumerate(docs)])

print(format_docs(docs))

 ChatGPT can also be used to solve complex problems or suggest creative ideas. 
ChatGPT's capabilities are further developed through continuous learning and updates. 
ChatGPT, an AI designed to talk to users, can answer a variety of questions. 
chatGPT was developed by OpenAI and is constantly improving. 
This is just what I wrote down. 
Chatzipiti has learned a large amount of data to understand the user's questions and generate appropriate answers. 
Bitcoin is also called digital gold and is gaining popularity as a means of storing value. 
IPhone, iPad, and MacBook are representative products released by Apple. 
Wearable devices such as Apple Watch and Airpot also belong to Apple's popular product line. 
The FIFA World Cup is held every fourth year and is the biggest event in international football.

def format_docs(docs):
    return "\n".join(
        [
            f"[{i}] {doc.page_content} [source: teddylee777@gmail.com]"
            for i, doc in enumerate(docs)
        ]
    )


def reorder_documents(docs):
    # Reorder
    reordering = LongContextReorder()
    reordered_docs = reordering.transform_documents(docs)
    combined = format_docs(reordered_docs)
    print(combined)
    return combined

Output rearranged documents.

# Print the rearranged document
_ = reorder_documents(docs)

[0] ChatGPT's capabilities are further developed through continuous learning and updates. [source: teddylee777@gmail.com] 
[1] ChatGPT was developed by OpenAI and is constantly improving. [source: teddylee777@gmail.com] 
[2] Chatzipiti has learned a large amount of data to understand the user's questions and generate appropriate answers. [source: teddylee777@gmail.com] 
[3] iPhone, iPad, MacBook, etc. are representative products released by Apple. [source: teddylee777@gmail.com] 
[4] The FIFA World Cup is held every fourth year and is the biggest event in international football. [source: teddylee777@gmail.com] 
[5] Wearable devices such as Apple Watch and Airpot also belong to Apple's popular product line. [source: teddylee777@gmail.com] 
[6] Bitcoin is also called digital gold and is gaining popularity as a means of value storage. [source: teddylee777@gmail.com] 
[7] This is just what I wrote down. [source: teddylee777@gmail.com] 
[8] ChatGPT, an AI designed to talk to users, can answer a variety of questions. [source: teddylee777@gmail.com] 
[9] ChatGPT can also be used to solve complex problems or suggest creative ideas. [source: teddylee777@gmail.com]

from langchain.prompts import ChatPromptTemplate
from operator import itemgetter
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda

# Prompt Template
template = """Given this text extracts:
{context}

-----
Please answer the following question:
{question}

Answer in the following languages: {language}
"""

# Prompt definition

prompt = ChatPromptTemplate.from_template(template)

# Chain definition
chain = (
    {
        "context": itemgetter("question")
        | retriever
        | RunnableLambda(reorder_documents),  # 질문을 기반으로 문맥을 검색합니다.
        "question": itemgetter("question"),  # 질문을 추출합니다.
        "language": itemgetter("language"),  # 답변 언어를 추출합니다.
    }
    | prompt  # 프롬프트 템플릿에 값을 전달합니다.
    | ChatOpenAI(model="gpt-4o-mini")  # 언어 모델에 프롬프트를 전달합니다.
    | StrOutputParser()  # 모델의 출력을 문자열로 파싱합니다.
)

question Enter a query in language Enter the language in.

Also check the search results for rearranged documents.

answer = chain.invoke(
    {"question": "What can you tell me about ChatGPT?", "language": "KOREAN"}
)

[0] ChatGPT's capabilities are further developed through continuous learning and updates. [source: teddylee777@gmail.com] 
[1] ChatGPT was developed by OpenAI and is constantly improving. [source: teddylee777@gmail.com] 
[2] Chatzipiti has learned a large amount of data to understand the user's questions and generate appropriate answers. [source: teddylee777@gmail.com] 
[3] iPhone, iPad, MacBook, etc. are representative products released by Apple. [source: teddylee777@gmail.com] 
[4] The FIFA World Cup is held every fourth year and is the biggest event in international football. [source: teddylee777@gmail.com] 
[5] Wearable devices such as Apple Watch and Airpot also belong to Apple's popular product line. [source: teddylee777@gmail.com] 
[6] Bitcoin is also called digital gold and is gaining popularity as a means of value storage. [source: teddylee777@gmail.com] 
[7] This is just what I wrote down. [source: teddylee777@gmail.com] 
[8] ChatGPT, an AI designed to talk to users, can answer a variety of questions. [source: teddylee777@gmail.com] 
[9] ChatGPT can also be used to solve complex problems or suggest creative ideas. [source: teddylee777@gmail.com]

Output the answer.

print(answer)

ChatGPT is an artificial intelligence developed by OpenAI, learning large amounts of data to understand user questions and generate appropriate answers. This AI is further evolving through continuous learning and updates, and can also be used to answer a variety of questions or solve complex problems. It also has the ability to suggest creative ideas. Users can interact like talking to ChatGPT.

Previous03. EnsembleRetriever Next05. ParentDocumentRetriever

Last updated 1 year ago

hashtagLong context rearrangement (LongContextReorder)

hashtagCreate an inquiry-response chain using Context Reordering

Long context rearrangement (LongContextReorder)

Create an inquiry-response chain using Context Reordering