PydanticOutputParser More output of the language model Convert to structured information This is a class that helps you do it. Instead of a simple text-like response, Provide information that users need in a clear and systematic form You can.
By utilizing this class, the output of the language model is converted to a specific data model, making it easier to process and utilize information.
PydanticOutputParser (This is also true for most OutputParser) Two key methods are implemented Should be.
get_format_instructions() : Provides instructions (instruction) that define the format of the information the language model should output. For example, you can return the fields of the data that the language model needs to output and instructions describing its shape as a string. The role of the instructions (instruction) at this time is very important. Following these guidelines, language models can structure outputs and convert them to specific data models.
parse() : Accept the output of the language model (assuming it in string) to analyze and convert it into a specific structure. Using a tool like Pydantic, the entered string is verified according to a predefined schema, and converted into a data structure that follows that schema.
# LangSmith Set up tracking. https://smith.langchain.com
# !pip install langchain-teddynote
from langchain_teddynote import logging
# Enter a project name.
logging.langsmith("CH03-OutputParser")
Given the above email content, we will parse the information in the email using the class defined in the Pydantic style below.
For reference, inside Field description This is the explanation for extracting key information from the answer in the form of silver text. LLM You will see this description and extract the information you need. Therefore, this explanation should be accurate and clear.
Define the prompt.
question : I get a question from User.
email_conversation : Enter the contents of the email body.
format : Specify the format.
Next, create Chain.
Run the chain and check the results.
Finally parser Using parse the results EmailSummary Convert to object.
Create chain with parser added
It can be created as a Pydantic object that defines the output result.
with_structured_output( )
.with_structured_output(Pydantic) If you add an output parser using, you can convert the output to a Pydantic object.
Reference
One regret .with_structured_output() function stream() It does not support features.
# For real-time output import
from langchain_teddynote.messages import stream_response
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
llm = ChatOpenAI(temperature=0, model_name="gpt-4o")
email_conversation = """From: Kim Cheol-su (chulsoo.kim@bikecorporation.me)
To: Lee Eun-chae (eunchae@teddyinternational.me)
Subject: "ZENESIS" Bicycle distribution cooperation and meeting schedule proposal
Hello, Eunchae Lee,
I am Kim Cheol-su, managing director of Bike Corporation. I would like to inform you of your company's new bicycle through a recent press release.
"ZENESIS" We learned about Bike Corporation. Bike Corporation is a leader in innovation and quality in the bicycle manufacturing and distribution industry, with long-term experience and expertise in this field.
ZENESIS We would like to request a detailed brochure on the model, especially information on technical specifications, battery performance, and design aspects. This will help us to better specify our proposed distribution strategy and marketing plan.
Also, I would like to propose a meeting next Tuesday (January 15th) at 10am to discuss the possibility of cooperation in more depth. Would it be possible to meet at your office and talk?
thank you
Kim Cheol-su
Managing Director
Bike Corporation
"""
from itertools import chain
from langchain_core.prompts import PromptTemplate
prompt = PromptTemplate.from_template(
"Please extract important content from the following email.\n\n{email_conversation}"
)
llm = ChatOpenAI(temperature=0, model_name="gpt-4o")
chain = prompt | llm
answer = chain.stream({"email_conversation": email_conversation})
output = stream_response(answer, return_output=True)
**Important content extraction:**
One. **Sender:** Kim Chul-soo (chulsoo.kim@bikecorporation.me)
2. **Receiver:** Lee Eun-bae (eunchae@teddyinternational.me)
3. **Title:** "ZENESIS" bicycle distribution cooperation and meeting schedule proposal
4. **Requirements:**
-Request detailed brochure for ZENESIS model (with technical specifications, battery performance, design information)
5. **Meeting proposal:**
-Date: Next Tuesday (15th January)
-Time: 10 am
-Place: Your office
print(output)
**Important content extraction:**
One. **Sender:** Kim Chul-soo (chulsoo.kim@bikecorporation.me)
2. **Receiver:** Lee Eun-bae (eunchae@teddyinternational.me)
3. **Title:** "ZENESIS" bicycle distribution cooperation and meeting schedule proposal
4. **Requirements:**
-Request detailed brochure for ZENESIS model (with technical specifications, battery performance, design information)
5. **Meeting proposal:**
-Date: Next Tuesday (15th January)
-Time: 10 am
-Place: Your office
class EmailSummary(BaseModel):
person: str = Field(description="Sender of the email")
email: str = Field(description="The email address of the sender")
subject: str = Field(description="Email Subject")
summary: str = Field(description="Text summarizing the body of the email")
date: str = Field(description="Meeting date and time mentioned in the body of the email")
# PydanticOutputParser generation
parser = PydanticOutputParser(pydantic_object=EmailSummary)
# instruction Prints out.
print(parser.get_format_instructions())
The output should be be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {" type"
the object {" foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.
Here is the output schema:
{"properties": {"person": {"title": "Person", "description": "\uba54\uc77c\uc744 \ubcf4\ub0b8 \uc0ac\ub78c", "type": "string"}, "email": {"title": "Email", "description": "\uba54\uc77c\uc744 \ubcf4\ub0b8 \uc0ac\ub78c\uc758 \uc774\uba54\uc77c \uc8fc\uc18c", "type": "string"}, "subject": {"title": "Subject", "description": "\uba54\uc77c \uc81c\ubaa9", "type": "string"}, "summary": {"title": "Summary", "description": "\uba54\uc77c \ubcf8\ubb38\uc744 \uc694\uc57d\ud55c \ud14d\uc2a4\ud2b8", "type": "string"}, "date": {"title": "Date", "description": "\uba54\uc77c \ubcf8\ubb38\uc5d0 \uc5b8\uae09\ub41c \ubbf8\ud305 \ub0a0\uc9dc\uc640 \uc2dc\uac04", "type": "string"}}, "required": ["person", "email", "subject", "summary", "date"]}
prompt = PromptTemplate.from_template(
"""
You are a helpful assistant. Please answer the following questions in KOREAN.
QUESTION:
{question}
EMAIL CONVERSATION:
{email_conversation}
FORMAT:
{format}
"""
)
# format to PydanticOutputParser partial formatting of
(partial) addition
prompt = prompt.partial(format=parser.get_format_instructions())
# chain creates.
chain = prompt | llm
# chain Run and print the results.
response = chain.stream(
{
"email_conversation": email_conversation,
"question": "Please extract the main content from the email.",
}
)
# The results are output in JSON format.
output = stream_response(response, return_output=True)
``json
{
"person": "Kim Chul-soo",
"email": "chulsoo.kim@bikecorporation.me",
"subject": "\"ZENESIS\" bicycle distribution cooperation and meeting schedule proposal",
"summary": "Bik Corporation's Managing Director Kim Cheol-soo sent a detailed brochure request and cooperation discussion on the new bike \"ZENESIS\" model to Teddy International's Lee Eun-bik. The meeting will be held at your office at 10 am next Tuesday (15 January).",
"date": "Next Tuesday (January 15) 10 am"
}
```
# PydanticOutputParser Parse the results using.
structured_output = parser.parse(output)
print(structured_output)
person='Kim Chul-soo' email='chulsoo.kim@bikecorporation.me' subject='"ZENESIS" Bike distribution cooperation and meeting schedule proposal' summary=' Kim Chul-soo's managing director of bikopulation teddy international's new bike "ZENIS The meeting will be held at your office at 10 am next Tuesday (15th January).' date='Next Tuesday (15th January) 10am'
# Add an output parser to reconstruct the entire chain.
chain = prompt | llm | parser
# chain Run and print the results.
response = chain.invoke(
{
"email_conversation": email_conversation,
"question": "Please extract the main content from the email.",
}
)
# The result is EmailSummary It is output in object form.
response
# invoke() Call a function and print the result.
answer = llm_with_structered.invoke(email_conversation)
answer
EmailSummary (person='Kim Chulsoo', email='chulsoo.kim@bikecorporation.me', subject='ZENESIS' proposed bike distribution cooperation and meeting schedule proposal', summary=' Kim Ji-soo is a biker ZENIS