12. 실험 비교(Pairwise Evaluation)

Pairwise Evaluation

Some assessments seek to compare two or more LLM products against each other.

Chatbot Arena This is a comparative evaluation method that can be easily found on the t Arena or LLM leaderboard.

# installation
# !pip install -qU langsmith langchain-teddynote
# Configuration file for managing API KEY as environment variable
from dotenv import load_dotenv

# API KEY load information
load_dotenv()
True
# LangSmith Set up tracking. https://smith.langchain.com
# !pip install -qU langchain-teddynote
from langchain_teddynote import logging

# Enter a project name.
logging.langsmith("CH16-Evaluations")
Start tracking LangSmith.
[Project Name]
CH16-Evaluations

Now we can create a dataset from these example runs.

Just save your input.

Perform comparative evaluations.

Last updated