TimeWeightedVectorStoreRetriever Is a search tool that combines semantic similarity with attenuation over time. Through this, the document or data "freshness" and "relevance" All of them are considered and provide results.
The scoring algorithm consists of:
semantic_similarity+(1.0−decay_rate)hourspassed
here semantic_similarity Indicates the semantic similarity between documents or data, decay_rate Is the percentage that indicates how much the score decreases over time. hours_passed means the time (in hours) that has elapsed since the object was last accessed.
The main feature of this approach is based on the time the object was last approached "The freshness of information" Is that it evaluates. In other words, Frequently approached objects score high over time To maintain, through this Frequently used or important information increases the likelihood that it will be located at the top of the search results. This method provides dynamic search results that take into account both the latest and relevant.
Especially, decay_rate not after the retriever's object was created Time elapsed since last access Means. In other words, frequently accessed objects remain'latest'.
# API A configuration file for managing keys as environment variables.from dotenv import load_dotenv# API Load key informationload_dotenv()
True
# LangSmith set up tracking. https://smith.langchain.com# !pip install langchain-teddynotefrom langchain_teddynote import logging# Enter a project name.logging.langsmith("CH11-Retriever")
Low decay_rate
decay rate Low (I'll set it extremely close to zero here) "Remember" longer It means that it will.
decay rate end 0 This is never forgotten Means to, which makes this retriever equal to the vector lookup.
TimeWeightedVectorStoreRetriever Initialize the vector reservoir, damping rate ( decay_rate ) To a very small value, and the number of vectors to search for (k) is 1.
Add simple example data.
retriever.invoke() Perform a search by calling.
This is because it is the most prominent (salient) document.
decay_rate end Because it is close to zero The document in is still considered the latest (recent).
High decay_rate
High decay_rate (E.g. 0.9999...)Using recency score It converges to zero quickly.
(If you set this value to 1, recency The value is 0, and you get the same result as Vector Lookup.)
TimeWeightedVectorStoreRetriever Use to initialize the searcher. decay_rate Adjust the weight reduction rate over time by setting 0.999.
Add a new document again.
retriever.invoke("테디노트") When called ""테디노트 구독 해주실꺼죠? Please!"" Is returned first. -This is retriever's "Subscribe to the teddy note." This is because most of the documents related to have been forgotten.
Arrangement of damping rate (decay_rate)
decay_rate When set to 0.000001 very small
The attenuation rate (i.e., the rate of oblivion of information) is very low, so I rarely forget the information.
therefore, There is little time weight difference, whether it's up-to-date or old. At this time, you will give a higher score for similarity.
decay_rate When set to 0.999, close to 1
The attenuation rate (i.e., the rate of oblivion of information) is very high. Therefore, the information of the past is almost forgotten.
Therefore, these cases will give you a higher score for the latest information.
In virtual time decay_rate adjustment
Some utilities from LangChain allow you to mock (mock) time components.
mock_now A function is a utility function provided by LangChain, used to mock the current time.
mock_now You can use functions to test your search results while changing the current time.
Take advantage of that feature decay_rate You can help find it.
[Caution] If you set it to a time that was too long ago, you may get an error when calculating decay_rate.
from datetime import datetime, timedelta
import faiss
from langchain.docstore import InMemoryDocstore
from langchain.retrievers import TimeWeightedVectorStoreRetriever
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
# Define an embedding model.
embeddings_model = OpenAIEmbeddings(model="text-embedding-3-small")
# Initializes the vector storage to an empty state.
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model, index, InMemoryDocstore({}), {})
# Initialize a time-weighted vector storage searcher (here, applying a low decay rate).
retriever = TimeWeightedVectorStoreRetriever(
vectorstore=vectorstore, decay_rate=0.0000000000000000000000001, k=1
)
# Calculate yesterday's date.
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents(
# Add a document and set yesterday's date in metadata.
[
Document(
page_content="Please subscribe to Teddy Note.",
metadata={"last_accessed_at": yesterday},
)
]
)
# Add another document. No metadata set separately.
retriever.add_documents([Document(page_content="Would you like to subscribe to Teddy Note? Please!")])
['a6c732c4-adb2-45d1-bcbb-a5108a9778f7']
# "The reason why "Please subscribe to Teddy Note" is returned first is because it is the most prominent.
# Since the decay rate is close to 0, it means that it is still up to date.
retriever.invoke("teddy note")
# Define an embedding model.
embeddings_model = OpenAIEmbeddings(model="text-embedding-3-small")
# Initializes the vector storage to an empty state.
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model, index, InMemoryDocstore({}), {})
# Initializes a time-weighted vector storage finder.
retriever = TimeWeightedVectorStoreRetriever(
vectorstore=vectorstore, decay_rate=0.999, k=1
)
# Calculate yesterday's date.
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents(
# Add a document and set yesterday's date in metadata.
[
Document(
page_content="Please subscribe to Teddy Note.",
metadata={"last_accessed_at": yesterday},
)
]
)
# Add another document. No metadata set separately.
retriever.add_documents([Document(page_content="테디노트 구독 해주실꺼죠? Please!")])
['c3349ba9-75c7-49ec-be7a-017bc0917fa2']
# Check results after search
retriever.invoke("teddy note")
import datetime
from langchain.utils import mock_now
# Set current time to a specific point in time
mock_now(datetime.datetime(2024, 8, 30, 00, 00))
# Print current time
print(datetime.datetime.now())
2024-08-30 22:05:01.844175
# Change the current time to any time.
with mock_now(datetime.datetime(2024, 8, 29, 00, 00)):
# Search documents at the point of change.
print(retriever.invoke("teddy note"))