# 02. CacheBackedEmbeddings

## CacheBackedEmbeddings <a href="#cachebackedembeddings" id="cachebackedembeddings"></a>

Embeddings can be stored or temporarily cached to avoid recalculation.

Caching Embeddings `CacheBackedEmbeddings` Can be done using Cache support embedder is a wrapper around the embedder caching embeddings to the key-value repository. The text is hashed and this hash is used as a key in the cache.

`CacheBackedEmbeddings` The main support method for initializing `from_bytes_store` is. This receives the following parameters:

* `underlying_embeddings` : Embedder used for embedding.
* `document_embedding_cache` : To cache document embedding `ByteStore` One of.
* `namespace` : (Optional, default `""` ) Namespace used for document cache. This namespace is used to avoid collisions with other caches. For example, set the name of the embedding model used.

**caution** : To avoid collisions when the same text is embedded using different embedding models `namespace` It is important to set parameters.

### Embedding in LocalFileStore (permanent storage) <a href="#localfilestore" id="localfilestore"></a>

First, let's take a look at the example of using a local file system to store embedding and using the FAISS vector store.

```
from langchain.storage import LocalFileStore
from langchain_openai import OpenAIEmbeddings
from langchain.embeddings import CacheBackedEmbeddings

# OpenAI Setting up default embeddings using embeddings
embedding = OpenAIEmbeddings()

# Setting up local file storage
store = LocalFileStore("./cache/")

# Generating embeddings that support cache
cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings=embedding,
    document_embedding_cache=store,
    namespace=embedding.model,  # Generate cache-enabled embeddings using base embeddings and storage
)
```

```
# store Get the keys sequentially from .
list(store.yield_keys())
```

```
 ['text-embedding-ada-0020fd71f95-1342-512d-9d5b-3e3ab3c6bbe0','text-embedding-ada-00274ae75af-9058-555e-aefa-082f0b4e05text-embedding-ada-0029db9e1cd-62d8-50fc-94f4-24bef3cacaf5','version.txt','text-embedding-ada-002cc824f84-d691-544f-9d9c-ca7e 
```

Load documents, divide them into chunks, embed each chunk and load them into vector repositories.

```
from langchain.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

# Load Document
raw_documents = TextLoader("./data/appendix-keywords.txt").load()
# Set text split by character
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
# Split document
documents = text_splitter.split_documents(raw_documents)
```

```
# Measure code execution time.
%time db = FAISS.from_documents(documents, cached_embedder)  # Creating a FAISS database from documents
```

```
CPU times: user 3.87 ms, sys: 1.49 ms, total: 5.35 ms 
Wall time: 4.3 ms 
```

When I try to re-create the vector repository, it is processed much faster because I don't need to recalculate the embedding.

```
# Creating a FAISS database using cached embeddings
%time db2 = FAISS.from_documents(documents, cached_embedder)
```

```
 CPU times: user 4.22 ms, sys: 1.44 ms, total: 5.66 ms 
Wall time: 4.55 ms 
```

### `InmemoryByteStore` Use (non-permanent) <a href="#inmemorybytestore" id="inmemorybytestore"></a>

Different `ByteStore` To use `CacheBackedEmbeddings` When generating `ByteStore` If you use.

Below, it is non-permanent `InMemoryByteStore` Shows an example that uses to create the same cached embedding object.

```
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import InMemoryByteStore

store = InMemoryByteStore()  # Create an in-memory byte store

# Generating cache-assisted embeddings
cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    embedding, store, namespace=embedding.model
)
```
