# 09. Time Weighted Vector StoreRetriever

## TimeWeightedVectorStoreRetriever <a href="#timeweightedvectorstoreretriever" id="timeweightedvectorstoreretriever"></a>

`TimeWeightedVectorStoreRetriever` Is a search tool that combines semantic similarity with attenuation over time. Through this, the document or data **"freshness"** and **"relevance"** All of them are considered and provide results.

The scoring algorithm consists of:

semantic\_similarity+(1.0−decay\_rate)hourspassed

here `semantic_similarity` Indicates the semantic similarity between documents or data, `decay_rate` Is the percentage that indicates how much the score decreases over time. `hours_passed` means the time (in hours) that has elapsed since the object was last accessed.

The main feature of this approach is based on the time the object was last approached **"The freshness of information"** Is that it evaluates. In other words, **Frequently approached objects score high over time** To maintain, through this **Frequently used or important information increases the likelihood that it will be located at the top of the search results.** This method provides dynamic search results that take into account both the latest and relevant.

Especially, `decay_rate` not after the retriever's object was created **Time elapsed since last access** Means. In other words, frequently accessed objects remain'latest'.

```python
# API A configuration file for managing keys as environment variables.
from dotenv import load_dotenv

# API Load key information
load_dotenv()
```

```
 True 
```

```python
# LangSmith set up tracking. https://smith.langchain.com
# !pip install langchain-teddynote
from langchain_teddynote import logging

# Enter a project name.
logging.langsmith("CH11-Retriever")
```

```
 Start tracking LangSmith. 
[Project name] 
CH11-Retriever 
```

### Low decay\_rate <a href="#low-decay_rate" id="low-decay_rate"></a>

* `decay rate` Low (I'll set it extremely close to zero here) **"Remember" longer** It means that it will.
* `decay rate` end **0 This is never forgotten** Means to, which makes this retriever equal to the vector lookup.

`TimeWeightedVectorStoreRetriever` Initialize the vector reservoir, damping rate ( `decay_rate` ) To a very small value, and the number of vectors to search for (k) is 1.

```python
from datetime import datetime, timedelta

import faiss
from langchain.docstore import InMemoryDocstore
from langchain.retrievers import TimeWeightedVectorStoreRetriever
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

# Define an embedding model.
embeddings_model = OpenAIEmbeddings(model="text-embedding-3-small")

# Initializes the vector storage to an empty state.
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model, index, InMemoryDocstore({}), {})

# Initialize a time-weighted vector storage searcher (here, applying a low decay rate).
retriever = TimeWeightedVectorStoreRetriever(
    vectorstore=vectorstore, decay_rate=0.0000000000000000000000001, k=1
)
```

Add simple example data.

```python
# Calculate yesterday's date.
yesterday = datetime.now() - timedelta(days=1)

retriever.add_documents(
    # Add a document and set yesterday's date in metadata.
    [
        Document(
            page_content="Please subscribe to Teddy Note.",
            metadata={"last_accessed_at": yesterday},
        )
    ]
)

# Add another document. No metadata set separately.
retriever.add_documents([Document(page_content="Would you like to subscribe to Teddy Note? Please!")])
```

```
 ['a6c732c4-adb2-45d1-bcbb-a5108a9778f7'] 
```

`retriever.invoke()` Perform a search by calling.

* This is because it is the most prominent (salient) document.
* `decay_rate` end **Because it is close to zero** The document in is still considered the latest (recent).

```python
# "The reason why "Please subscribe to Teddy Note" is returned first is because it is the most prominent.
# Since the decay rate is close to 0, it means that it is still up to date.
retriever.invoke("teddy note")
```

```
 [Document (metadata={'last_accessed_at': datetime.datetime (2024, 8, 30, 22, 1, 49, 841379),'created_at': datetime.datetime (2024, 8, 30,2 
```

### High decay\_rate <a href="#high-decay_rate" id="high-decay_rate"></a>

High `decay_rate` (E.g. 0.9999...)Using `recency score` It converges to zero quickly.

(If you set this value to 1, `recency` The value is 0, and you get the same result as Vector Lookup.)

`TimeWeightedVectorStoreRetriever` Use to initialize the searcher. `decay_rate` Adjust the weight reduction rate over time by setting 0.999.

```python
# Define an embedding model.
embeddings_model = OpenAIEmbeddings(model="text-embedding-3-small")

# Initializes the vector storage to an empty state.
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model, index, InMemoryDocstore({}), {})

# Initializes a time-weighted vector storage finder.
retriever = TimeWeightedVectorStoreRetriever(
    vectorstore=vectorstore, decay_rate=0.999, k=1
)
```

Add a new document again.

```python
# Calculate yesterday's date.
yesterday = datetime.now() - timedelta(days=1)

retriever.add_documents(
    # Add a document and set yesterday's date in metadata.
    [
        Document(
            page_content="Please subscribe to Teddy Note.",
            metadata={"last_accessed_at": yesterday},
        )
    ]
)

# Add another document. No metadata set separately.
retriever.add_documents([Document(page_content="테디노트 구독 해주실꺼죠? Please!")])
```

```
 ['c3349ba9-75c7-49ec-be7a-017bc0917fa2'] 
```

`retriever.invoke("테디노트")` When called `""테디노트 구독 해주실꺼죠? Please!""` Is returned first. -This is retriever's "Subscribe to the teddy note." This is because most of the documents related to have been forgotten.

```python
# Check results after search
retriever.invoke("teddy note")
```

```
 [Document (metadata={'last_accessed_at': datetime.datetime (2024, 8, 30, 22, 3, 18, 331780),'created_at': datetime.datetime(2024, 8, 30,2 Please!')] 
```

### Arrangement of damping rate (decay\_rate) <a href="#decay_rate" id="decay_rate"></a>

* `decay_rate` When set to 0.000001 very small
* The attenuation rate (i.e., the rate of oblivion of information) is very low, so I rarely forget the information.
* therefore, **There is little time weight difference, whether it's up-to-date or old.** At this time, you will give a higher score for similarity.
* `decay_rate` When set to 0.999, close to 1
* The attenuation rate (i.e., the rate of oblivion of information) is very high. Therefore, the information of the past is almost forgotten.
* Therefore, these cases will give you a higher score for the latest information.

### In virtual time `decay_rate` adjustment <a href="#decay_rate_1" id="decay_rate_1"></a>

Some utilities from LangChain allow you to mock (mock) time components.

* `mock_now` A function is a utility function provided by LangChain, used to mock the current time.

```python
import datetime

from langchain.utils import mock_now

# Set current time to a specific point in time
mock_now(datetime.datetime(2024, 8, 30, 00, 00))

# Print current time
print(datetime.datetime.now())
```

```
 2024-08-30 22:05:01.844175
```

`mock_now` You can use functions to test your search results while changing the current time.

* Take advantage of that feature `decay_rate` You can help find it.

\[Caution] If you set it to a time that was too long ago, you may get an error when calculating decay\_rate.

```python
# Change the current time to any time.
with mock_now(datetime.datetime(2024, 8, 29, 00, 00)):
    # Search documents at the point of change.
    print(retriever.invoke("teddy note"))
```

```
 [Document (metadata={'last_accessed_at': MockDateTime (2024, 8, 29, 0, 0),'created_at': datetime.datetime (2024, 8, 30, 22, 2, 44, 6187 Please!')]
```

<br>