01. PDF document based QA (Question-Answer)
Understanding RAG basic structure
1. Pre-processing -1~4 steps
2. RAG performance (RunTime) -5~8 steps
The name of the AI developed by the Samsung Electronics itself is'Samsung Gauss'.# Chain executive(Run Chain)
# Enter a query for the document and print out the answer.
question = "The name of the AI developed by Samsung Electronics is?"
response = chain.invoke(question)
print(response)from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
# step 1: load document(Load Documents)
loader = PyMuPDFLoader("data/SPRI_AI_Brief_2023년12월호_F.pdf")
docs = loader.load()
# step 2: load document(Split Documents)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
split_documents = text_splitter.split_documents(docs)
# step 3: embedding(Embedding)generation
embeddings = OpenAIEmbeddings()
# step 4: Create DB and save
# Create a vector store.
vectorstore = FAISS.from_documents(documents=split_documents, embedding=embeddings)
# step 5: Create a Retriever
# Retrieves and generates information contained in documents.
retriever = vectorstore.as_retriever()
# step 6: Create Prompt
# Generate a prompt.
prompt = PromptTemplate.from_template(
"""You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Answer in Korean.
#Question:
{question}
#Context:
{context}
#Answer:"""
)
# step 7: 언어모델(LLM) 생성
# Create a model (LLM).
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)
# step 8: 체인(Chain) 생성
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)Full code
RAG basic pipeline (1~8 steps)
Preferences
Documents utilized for practice
Last updated