02. CacheBackedEmbeddings

CacheBackedEmbeddings

Embeddings can be stored or temporarily cached to avoid recalculation.

Caching Embeddings CacheBackedEmbeddings Can be done using Cache support embedder is a wrapper around the embedder caching embeddings to the key-value repository. The text is hashed and this hash is used as a key in the cache.

CacheBackedEmbeddings The main support method for initializing from_bytes_store is. This receives the following parameters:

underlying_embeddings : Embedder used for embedding.
document_embedding_cache : To cache document embedding ByteStore One of.
namespace : (Optional, default "" ) Namespace used for document cache. This namespace is used to avoid collisions with other caches. For example, set the name of the embedding model used.

caution : To avoid collisions when the same text is embedded using different embedding models namespace It is important to set parameters.

Embedding in LocalFileStore (permanent storage)

First, let's take a look at the example of using a local file system to store embedding and using the FAISS vector store.

from langchain.storage import LocalFileStore
from langchain_openai import OpenAIEmbeddings
from langchain.embeddings import CacheBackedEmbeddings

# OpenAI Setting up default embeddings using embeddings
embedding = OpenAIEmbeddings()

# Setting up local file storage
store = LocalFileStore("./cache/")

# Generating embeddings that support cache
cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings=embedding,
    document_embedding_cache=store,
    namespace=embedding.model,  # Generate cache-enabled embeddings using base embeddings and storage
)

# store Get the keys sequentially from .
list(store.yield_keys())

 ['text-embedding-ada-0020fd71f95-1342-512d-9d5b-3e3ab3c6bbe0','text-embedding-ada-00274ae75af-9058-555e-aefa-082f0b4e05text-embedding-ada-0029db9e1cd-62d8-50fc-94f4-24bef3cacaf5','version.txt','text-embedding-ada-002cc824f84-d691-544f-9d9c-ca7e

Load documents, divide them into chunks, embed each chunk and load them into vector repositories.

from langchain.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

# Load Document
raw_documents = TextLoader("./data/appendix-keywords.txt").load()
# Set text split by character
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
# Split document
documents = text_splitter.split_documents(raw_documents)

# Measure code execution time.
%time db = FAISS.from_documents(documents, cached_embedder)  # Creating a FAISS database from documents

CPU times: user 3.87 ms, sys: 1.49 ms, total: 5.35 ms 
Wall time: 4.3 ms

When I try to re-create the vector repository, it is processed much faster because I don't need to recalculate the embedding.

# Creating a FAISS database using cached embeddings
%time db2 = FAISS.from_documents(documents, cached_embedder)

 CPU times: user 4.22 ms, sys: 1.44 ms, total: 5.66 ms 
Wall time: 4.55 ms

`InmemoryByteStore` Use (non-permanent)

Different ByteStore To use CacheBackedEmbeddings When generating ByteStore If you use.

Below, it is non-permanent InMemoryByteStore Shows an example that uses to create the same cached embedding object.

from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import InMemoryByteStore

store = InMemoryByteStore()  # Create an in-memory byte store

# Generating cache-assisted embeddings
cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    embedding, store, namespace=embedding.model
)

Previous01. OpenAIEmbeddings Next03. (HuggingFace Embeddings)

Last updated 1 year ago

hashtagCacheBackedEmbeddings

hashtagEmbedding in LocalFileStore (permanent storage)

hashtagInmemoryByteStore Use (non-permanent)

CacheBackedEmbeddings

Embedding in LocalFileStore (permanent storage)

`InmemoryByteStore` Use (non-permanent)