02. Naive RAG

Naive RAG

step

Perform Naive RAG

Preferences

# Configuration file for managing API keys as environment variables
from dotenv import load_dotenv

# Load API key information
load_dotenv()

 True

# Set up LangSmith tracking. https://smith.langchain.com
# !pip install -qU langchain-teddynote
from langchain_teddynote import logging

# Enter a project name.
logging.langsmith("CH17-LangGraph-Structures")

 Start tracking LangSmith. 
[Project name] 
CH17-LangGraph-Structures

Basic PDF-based Retrieval Chain creation

Here, we create a Retrieval Chain based on PDF documents. Retrieval Chain with the simplest structure.

However, LangGraph creates Retirever and Chain separately. Only then can you do detailed processing for each node.

from rag.pdf import PDFRetrievalChain

# Load a PDF document.
pdf = PDFRetrievalChain(["data/SPRI_AI_Brief_2023년12월호_F.pdf"]).create_chain()

# Create a retriever and a chain.
pdf_retriever = pdf.retriever
pdf_chain = pdf.chain

First, use pdf_retriever to get your search results.

search_result = pdf_retriever.invoke("Please tell us the companies and amounts invested in Anthropic..")
search_result

 [Document (metadata={'source':'data/SPRI_AI_Brief_2023 Year 12 _F.pdf','file_path':'data/SPRI_AI_Brief_2023 Year 12 _F.pdf','page': 13, 'total_ Policy/law 2. Enterprise/Industry 3. Technology / Research 4. Workforce/training\ngoggles create $20 billion investment in Aspics Enhances AI cooperation\nKEY Contents\nn Google has agreed to invest up to $20 billion in Aspics and has invested $500 million first, and Aspics has also signed a contract to use cloud services with \nnn 3rd Cloud Operator Google, Microsoft and Amazon are representative companies of the next generation AI model. Up to $200 billion investment agreement and cloud service delivery to Ansropic'), Document (metadata={'source':'  data/SPRI_AI_Brief_2023 December issue_F.pdf','file_path':'data/SPRI_AI_Brief_2023 December issue_F.pdf','page': 13,'total_pages': 23,'Author':'dj',  00'",'ModDate': "D:20231208132838+09'00'",'PDFVersion': '1.4'}, page_content='£Google, up to $200 billion investment agreement in Ansropic... ]  00'",'ModDate': "D:20231208132838+09'00'",'PDFVersion': '1.4'}, page_content='£Google, up to $200 billion investment agreement in Ansropic... ]

Pass the previously searched result to the context of the chain.

Copy# Generate answers based on search results.
answer = pdf_chain.invoke(
    {
        "question": "Please tell us the companies and amounts invested in Anthropic.",
        "context": search_result,
        "chat_history": [],
    }
)
print(answer)

Google has agreed to invest up to $200 billion in Ansropic, of which $500 million has been invested first. In addition, Google has already invested $550 million in February 2023. Amazon has released an investment plan of up to $400 billion in Antwerp. 

**Source** 
-data/SPRI_AI_Brief_2023 December issue_F.pdf (page 13)

State definition

State : Defines the state of sharing between nodes and nodes in Graph.

Generally TypedDict Use format.

from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages


# GraphState State Definition
class GraphState(TypedDict):
    question: Annotated[str, "Question"]  # question
    context: Annotated[str, "Context"]  # Search results for the document
    answer: Annotated[str, "Answer"]  # answerw
    messages: Annotated[list, add_messages]  # 메시지(누적되는 list)

Node definition

Nodes : Nodes that handle each step. Usually implemented as a Python function. Input and output are state values.

Reference

State Updated after performing a defined logic with input State Returns.

from langchain_teddynote.messages import messages_to_history
from rag.utils import format_docs


# Document Search Node
def retrieve_document(state: GraphState) -> GraphState:
    # Get the question from the state.
    latest_question = state["question"]

    # Search the documentation to find relevant articles.
    retrieved_docs = pdf_retriever.invoke(latest_question)

    # Formats the retrieved document (for input into the prompt)
    retrieved_docs = format_docs(retrieved_docs)

    # Stores the searched document in the context key.
    return GraphState(context=retrieved_docs)


# Generate Answer Node
def llm_answer(state: GraphState) -> GraphState:
    # Get the question from the state.
    latest_question = state["question"]

    # Get the searched documents in status.
    context = state["context"]

    # Call the chain to generate an answer.
    response = pdf_chain.invoke(
        {
            "question": latest_question,
            "context": context,
            "chat_history": messages_to_history(state["messages"]),
        }
    )
    # Stores generated answers, (user's questions, answers) messages in the state.
    return GraphState(
        answer=response, messages=[("user", latest_question), ("assistant", response)]
    )

Edges

Edges : Currently State Run next based on Node Python function to determine.

General edges, conditional edges, and more.

from langgraph.graph import END, StateGraph
from langgraph.checkpoint.memory import MemorySaver

# create a graph
workflow = StateGraph(GraphState)

# node definition
workflow.add_node("retrieve", retrieve_document)
workflow.add_node("llm_answer", llm_answer)

# edge definition
workflow.add_edge("retrieve", "llm_answer")  # 검색 -> 답변
workflow.add_edge("llm_answer", END)  # 답변 -> 종료

# Setting the graph entry point
workflow.set_entry_point("retrieve")

# Set checkpoint
memory = MemorySaver()

# compile
app = workflow.compile(checkpointer=memory)

Visualize compa-like graphs.

from langchain_teddynote.graphs import visualize_graph

visualize_graph(app)

Graph execution

config Parameters convey the necessary setting information when running the graph.
recursion_limit : Set the maximum number of recurses when running the graph.
inputs : Pass the required input information when running the graph.

Reference

Message output streaming Everything in LangGraph streaming mode Please refer to.

Under stream_graph A function is a function that only streams certain nodes.

You can easily check the streaming output of a specific node.

from langchain_core.runnables import RunnableConfig
from langchain_teddynote.messages import stream_graph, random_uuid

# config settings (max recursion count, thread_id)
config = RunnableConfig(recursion_limit=20, configurable={"thread_id": random_uuid()})

# Enter your question
inputs = GraphState(question="Please tell me the companies and amounts invested in Anthropic.")

# Running the graph
stream_graph(app, inputs, config, ["llm_answer"])

 ================================================== 
🔄 Node: llm_answer🔄 
- - - - - - - - - - - - - - - - - - - - - - - - - - - -  
Google has agreed to invest up to $20 billion in Ansropic, of which $500 million has been invested first. Amazon has released an investment plan of up to $400 billion in Antwerp. 

**Source** 
-data/SPRI_AI_Brief_2023 December issue_F.pdf (page 14)

outputs = app.get_state(config).values

print(f'Question: {outputs["question"]}')
print("===" * 20)
print(f'Answer:\n{outputs["answer"]}')

 Question: Please tell us the amount of your investment and the companies that have invested in Ansropic. 
============================================================ 
Answer: 
Google has agreed to invest up to $20 billion in Ansropic, of which $500 million has been invested first. Amazon has released an investment plan of up to $400 billion in Antwerp. 

**Source** 
-data/SPRI_AI_Brief_2023 December issue_F.pdf (page 14)

Previous01. Default graph generation Next03. Relevance Checker module added

Last updated 1 year ago

hashtagNaive RAG

hashtagPreferences

hashtagBasic PDF-based Retrieval Chain creation

hashtagState definition

hashtagNode definition

hashtagEdges

hashtagGraph execution