01. Pydantic output parser (PydanticOutputParser)

PydanticOuputParser

PydanticOutputParser More output of the language model Convert to structured information This is a class that helps you do it. Instead of a simple text-like response, Provide information that users need in a clear and systematic form You can.

By utilizing this class, the output of the language model is converted to a specific data model, making it easier to process and utilize information.

PydanticOutputParser (This is also true for most OutputParser) Two key methods are implemented Should be.

get_format_instructions() : Provides instructions (instruction) that define the format of the information the language model should output. For example, you can return the fields of the data that the language model needs to output and instructions describing its shape as a string. The role of the instructions (instruction) at this time is very important. Following these guidelines, language models can structure outputs and convert them to specific data models.
parse() : Accept the output of the language model (assuming it in string) to analyze and convert it into a specific structure. Using a tool like Pydantic, the entered string is verified according to a predefined schema, and converted into a data structure that follows that schema.

Reference - Pydantic official document

from dotenv import load_dotenv

load_dotenv()

True

# LangSmith Set up tracking. https://smith.langchain.com
# !pip install langchain-teddynote
from langchain_teddynote import logging

# Enter a project name.
logging.langsmith("CH03-OutputParser")

 Start tracking LangSmith. 
[Project name] 
CH03-OutputParser

# For real-time output import
from langchain_teddynote.messages import stream_response

from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field


llm = ChatOpenAI(temperature=0, model_name="gpt-4o")

Here is an example of the email body.

email_conversation = """From: Kim Cheol-su (chulsoo.kim@bikecorporation.me)
To: Lee Eun-chae (eunchae@teddyinternational.me)
Subject: "ZENESIS" Bicycle distribution cooperation and meeting schedule proposal

Hello, Eunchae Lee,

I am Kim Cheol-su, managing director of Bike Corporation. I would like to inform you of your company's new bicycle through a recent press release.
 "ZENESIS" We learned about Bike Corporation. Bike Corporation is a leader in innovation and quality in the bicycle manufacturing and distribution industry, with long-term experience and expertise in this field.

ZENESIS We would like to request a detailed brochure on the model, especially information on technical specifications, battery performance, and design aspects. This will help us to better specify our proposed distribution strategy and marketing plan.

Also, I would like to propose a meeting next Tuesday (January 15th) at 10am to discuss the possibility of cooperation in more depth. Would it be possible to meet at your office and talk?

thank you

Kim Cheol-su
Managing Director
Bike Corporation
"""

Example when not using output parser

from itertools import chain
from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate.from_template(
    "Please extract important content from the following email.\n\n{email_conversation}"
)

llm = ChatOpenAI(temperature=0, model_name="gpt-4o")

chain = prompt | llm

answer = chain.stream({"email_conversation": email_conversation})

output = stream_response(answer, return_output=True)

**Important content extraction:** 

One. **Sender:** Kim Chul-soo (chulsoo.kim@bikecorporation.me) 
2. **Receiver:** Lee Eun-bae (eunchae@teddyinternational.me) 
3. **Title:** "ZENESIS" bicycle distribution cooperation and meeting schedule proposal 
4. **Requirements:** 
   -Request detailed brochure for ZENESIS model (with technical specifications, battery performance, design information) 
5. **Meeting proposal:** 
   -Date: Next Tuesday (15th January) 
   -Time: 10 am 
   -Place: Your office

print(output)

**Important content extraction:** 

One. **Sender:** Kim Chul-soo (chulsoo.kim@bikecorporation.me) 
2. **Receiver:** Lee Eun-bae (eunchae@teddyinternational.me) 
3. **Title:** "ZENESIS" bicycle distribution cooperation and meeting schedule proposal 
4. **Requirements:** 
   -Request detailed brochure for ZENESIS model (with technical specifications, battery performance, design information) 
5. **Meeting proposal:** 
   -Date: Next Tuesday (15th January) 
   -Time: 10 am 
   -Place: Your office

Given the above email content, we will parse the information in the email using the class defined in the Pydantic style below.

For reference, inside Field description This is the explanation for extracting key information from the answer in the form of silver text. LLM You will see this description and extract the information you need. Therefore, this explanation should be accurate and clear.

class EmailSummary(BaseModel):
    person: str = Field(description="Sender of the email")
    email: str = Field(description="The email address of the sender")
    subject: str = Field(description="Email Subject")
    summary: str = Field(description="Text summarizing the body of the email")
    date: str = Field(description="Meeting date and time mentioned in the body of the email")


# PydanticOutputParser generation
parser = PydanticOutputParser(pydantic_object=EmailSummary)

# instruction Prints out.
print(parser.get_format_instructions())

The output should be be formatted as a JSON instance that conforms to the JSON schema below. 

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {" type" 
the object {" foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted. 

Here is the output schema: 

{"properties": {"person": {"title": "Person", "description": "\uba54\uc77c\uc744 \ubcf4\ub0b8 \uc0ac\ub78c", "type": "string"}, "email": {"title": "Email", "description": "\uba54\uc77c\uc744 \ubcf4\ub0b8 \uc0ac\ub78c\uc758 \uc774\uba54\uc77c \uc8fc\uc18c", "type": "string"}, "subject": {"title": "Subject", "description": "\uba54\uc77c \uc81c\ubaa9", "type": "string"}, "summary": {"title": "Summary", "description": "\uba54\uc77c \ubcf8\ubb38\uc744 \uc694\uc57d\ud55c \ud14d\uc2a4\ud2b8", "type": "string"}, "date": {"title": "Date", "description": "\uba54\uc77c \ubcf8\ubb38\uc5d0 \uc5b8\uae09\ub41c \ubbf8\ud305 \ub0a0\uc9dc\uc640 \uc2dc\uac04", "type": "string"}}, "required": ["person", "email", "subject", "summary", "date"]}

Define the prompt.

question : I get a question from User.
email_conversation : Enter the contents of the email body.
format : Specify the format.

prompt = PromptTemplate.from_template(
    """
You are a helpful assistant. Please answer the following questions in KOREAN.

QUESTION:
{question}

EMAIL CONVERSATION:
{email_conversation}

FORMAT:
{format}
"""
)

# format to PydanticOutputParser partial formatting of
(partial) addition
prompt = prompt.partial(format=parser.get_format_instructions())

Next, create Chain.

# chain creates.
chain = prompt | llm

Run the chain and check the results.

# chain Run and print the results.
response = chain.stream(
    {
        "email_conversation": email_conversation,
        "question": "Please extract the main content from the email.",
    }
)

# The results are output in JSON format.
output = stream_response(response, return_output=True)

``json 
{ 
  "person": "Kim Chul-soo", 
  "email": "chulsoo.kim@bikecorporation.me", 
  "subject": "\"ZENESIS\" bicycle distribution cooperation and meeting schedule proposal", 
  "summary": "Bik Corporation's Managing Director Kim Cheol-soo sent a detailed brochure request and cooperation discussion on the new bike \"ZENESIS\" model to Teddy International's Lee Eun-bik. The meeting will be held at your office at 10 am next Tuesday (15 January).", 
  "date": "Next Tuesday (January 15) 10 am" 
} 
```

Finally parser Using parse the results EmailSummary Convert to object.

# PydanticOutputParser Parse the results using.
structured_output = parser.parse(output)
print(structured_output)

person='Kim Chul-soo' email='chulsoo.kim@bikecorporation.me' subject='"ZENESIS" Bike distribution cooperation and meeting schedule proposal' summary=' Kim Chul-soo's managing director of bikopulation teddy international's new bike "ZENIS The meeting will be held at your office at 10 am next Tuesday (15th January).' date='Next Tuesday (15th January) 10am'

Create chain with parser added

It can be created as a Pydantic object that defines the output result.

# Add an output parser to reconstruct the entire chain.
chain = prompt | llm | parser

# chain Run and print the results.
response = chain.invoke(
    {
        "email_conversation": email_conversation,
        "question": "Please extract the main content from the email.",
    }
)

# The result is EmailSummary It is output in object form.
response

with_structured_output( )

.with_structured_output(Pydantic) If you add an output parser using, you can convert the output to a Pydantic object.

llm_with_structered = ChatOpenAI(
    temperature=0, model_name="gpt-4o"
).with_structured_output(EmailSummary)

# invoke() Call a function and print the result.
answer = llm_with_structered.invoke(email_conversation)
answer

EmailSummary (person='Kim Chulsoo', email='chulsoo.kim@bikecorporation.me', subject='ZENESIS' proposed bike distribution cooperation and meeting schedule proposal', summary=' Kim Ji-soo is a biker ZENIS

Reference

One regret .with_structured_output() function stream() It does not support features.

PreviousCH03 Output Parsers Next02. Comma SeparatedListOutputParser

Last updated 1 year ago

hashtagPydanticOuputParser

hashtagCreate chain with parser added

hashtagwith_structured_output( )

PydanticOuputParser

Create chain with parser added

with_structured_output( )