01. Chroma
Chroma
This laptop covers how to start the Chroma vector store.
Chroma is an AI-native open source vector database focused on developer productivity and happiness. Chroma is licensed according to Apache 2.0.
Note link
# API A configuration file for managing keys as environment variables.
from dotenv import load_dotenv
# API Load key information
load_dotenv()True# LangSmith Set up tracking. https://smith.langchain.com
# !pip install langchain-teddynote
from langchain_teddynote import logging
# Enter a project name.
logging.langsmith("CH10-VectorStores")Load the sample dataset.
VectorStore creation
Vector repository creation (from_documents)
from_documents Class methods create vector repositories from document listings.
parameter
documents(List[Document]): List of documents to add to the vector repositoryembedding(Optional[Embeddings]): Embedding function. The default is Noneids(Optional[List[str]]): Document ID list. The default is Nonecollection_name(str): The name of the collection to be created.persist_directory(Optional[str]): Directory to store collections. The default is Noneclient_settings(Optional [chromadb.config.Settings]): Chroma client setupclient(Optional [chromadb.Client]): Chroma client instancecollection_metadata(Optional[Dict]): Collection composition information. The default is None
Reference
persist_directoryIf specified, the collection is stored in that directory. If not specified, data is temporarily stored in memory.This method is internally
from_textsCreate a vector repository by calling the method.Document
page_contentIn text,metadataIs used as a metadata.
return value
Chroma: Created Chroma vector repository instance When generatingdocumentsAs a parameterDocumentPass the list. Specifies the embedding model to use for embedding,namespacePlaying the rolecollection_nameYou can specify.
persist_directory When specified, disk saves it in file form.
By running the code below DB_PATH Load the data stored in.
Check the stored data in the called VectorStore.
if collection_name If you specify it differently, you will get no results because there is no stored data.
Vector repository creation (from_texts)
from_texts Class methods create vector repositories from text listings.
parameter
texts(List[str]): Text list to add to the collectionembedding(Optional[Embeddings]): Embedding function. The default is Nonemetadatas(Optional[List[dict]]): Metadata list. The default is Noneids(Optional[List[str]]): Document ID list. The default is Nonecollection_name(str): The name of the collection to be created. The default is'_LANGCHAIN_DEFAULT_COLLECTION_NAME'persist_directory(Optional[str]): Directory to store collections. The default is Noneclient_settings(Optional [chromadb.config.Settings]): Chroma client setupclient(Optional [chromadb.Client]): Chroma client instancecollection_metadata(Optional[Dict]): Collection composition information. The default is None
Reference
persist_directoryIf specified, the collection is stored in that directory. If not specified, data is temporarily stored in memory.idsIf not provided, it is automatically generated using UUID.
return value
Created vector repository instance
Similarity search
similarity_search The method performs a similarity search in the Chroma database. This method returns the documents most similar to the given query.
parameter
query(str): Query text to searchk(int, optional): Number of results to return. The default is 4.filter(Dict[str, str], optional): Filter by metadata. The default is None.
Reference
kYou can adjust the value to get the desired number of results.filterYou can use parameters to search only documents that meet certain metadata conditions.This method only returns this document without score information. Score information is also required
similarity_search_with_scoreUse the method yourself.
return value
List[Document]: List of documents most similar to query text
k You can specify the number of search results in the value.
filter on metadata You can use the information to filter your search results.
next filter Other in source Use to confirm the results you searched for.
Add documents to vector storage
add_documents The method adds or updates documents to the vector repository.
parameter
documents(List[Document]): List of documents to add to the vector repository**kwargs: Additional keyword factorsids: Document ID list (priority over the ID of the document at the time of delivery)
Reference
add_textsThe method should be implemented.Document
page_contentIn text,metadataIs used as a metadata.The document has an ID
kwargsIf no ID is provided, the document's ID is used.kwargsValueError occurs if the ID and number of documents do not match.
return value
List[str]: ID list of added text
exception
NotImplementedError:add_textsOccurs when the method is not implemented
add_texts The method embeds the text and adds it to the vector repository.
parameter
texts(Iterable[str]): Text list to add to the vector repositorymetadatas(Optional[List[dict]]): Metadata list. The default is Noneids(Optional[List[str]]): Document ID list. The default is None
Reference
idsIf not provided, it is automatically generated using UUID.If the embedding function is set, the text is embedded.
If metadata is provided:
Separate and process text with and without metadata.
For text without metadata, fill it with an empty dictionary.
Perform upsert tasks on the collection to add text, embedding, and metadata.
return value
List[str]: ID list of added text
exception
ValueError: When an error occurs due to a complex metadata, it occurs with a filtering method guide message When adding to an existing IDupsertIs performed, and existing documents are replaced.
Delete documents from vector storage
delete The method deletes the document of the specified ID from the vector repository.
parameter
ids(Optional[List[str]]): ID list of documents to be deleted. The default is None
Reference
This method is internally collected
deleteCall the method.idsIf it's None, it doesn't do anything.
return value
None
reset_collection
reset_collection The method initializes the collection of vector repositories.
Convert vector storage to Retriever
as_retriever The method produces VectorStoreRetriever based on the vector repository.
parameter
**kwargs: Keyword factor to pass to search functionsearch_type(Optional[str]): Search type ("similarity","mmr","similarity_score_threshold")search_kwargs(Optional[Dict]): Additional factors to pass to the search functionk: Number of documents to return (default: 4)score_threshold: Minimum similarity thresholdfetch_k: Number of documents to pass to MMR algorithm (default: 20)lambda_mult: Diversity regulation of MMR results (0~1, default: 0.5)filter: Filter document metadata
return value
VectorStoreRetriever: Vector repository based searcher instanceDBGenerate.
Four documents set to default values are viewed by performing a similar search.
Search for more documents with high diversity
k: Number of documents to return (default: 4)fetch_k: Number of documents to pass to MMR algorithm (default: 20)lambda_mult: Diversity regulation of MMR results (0~1, default: 0.5)
Get more documents for the MMR algorithm, but only return the top two
Search only documents with similarities above a certain threshold
Search only the single most similar document
Apply specific metadata filters
Multimodal Search
Chroma supports a multi-modal collection, a collection that can contain and query multiple forms of data.
Data set
Hosted in a Hugging Face coco object detection dataset Use a small subset of.
Only some of all the images in the dataset are downloaded locally and used to create a multi-modal collection.
Multimodal Embeddings
Utilize Multimodal Embeddings to create Embedding for images and text.
In this tutorial, we use OpenClipEmbeddingFunction to embed the image.
Model benchmark
Model
Training data
Resolution
# of samples seen
ImageNet zero-shot acc.
ConvNext-Base
LAION-2B
256px
13B
71.5%
ConvNext-Large
LAION-2B
320px
29B
76.9%
ConvNext-XXLarge
LAION-2B
256px
34B
79.5%
ViT-B/32
DataComp-1B
256px
34B
72.8%
ViT-B/16
DataComp-1B
224px
13B
73.5%
ViT-L/14
LAION-2B
224px
32B
75.3%
ViT-H/14
LAION-2B
224px
32B
78.0%
ViT-L/14
DataComp-1B
224px
13B
79.2%
ViT-G/14
LAION-2B
224px
34B
80.1%
In the example below model_name and checkpoint Set and use.
model_name: OpenCLIP model namecheckpoint: Of the OpenCLIP modelTraining dataName
model_name
checkpoint
0
RN50
openai
One
RN50
yfcc15m
2
RN50
cc12m
3
RN50-quickgelu
openai
4
RN50-quickgelu
yfcc15m
5
RN50-quickgelu
cc12m
6
RN101
openai
7
RN101
yfcc15m
8
RN101-quickgelu
openai
9
RN101-quickgelu
yfcc15m
Save the path of the image as list.
Create a description for image.
Below we calculate the similarity between the image description and the text you created.
Seek and visualize similarities between text versus image description.
Vectorstore creation and image addition
Generate Vectorstore and add images.
Below is the helper class to output the image retrieved results into the image.
Last updated