# 05. Weaviate

## Using Weaviate with LangChain

### Introduction to Weaviate

Weaviate is an open-source, AI-native vector database designed for storing and retrieving high-dimensional embeddings efficiently. It is particularly suited for applications such as semantic search, recommendation systems, and retrieval-augmented generation (RAG). Weaviate supports hybrid search capabilities, including keyword-based and vector-based searches, and integrates well with various machine learning frameworks.

### Setting Up Weaviate

#### 1. Installing Weaviate

To use Weaviate, install the Weaviate client:

```bash
pip install weaviate-client
```

If you want to run a local instance of Weaviate, you can use Docker:

```bash
docker run -d -p 8080:8080 semitechnologies/weaviate
```

This will start a Weaviate instance locally, accessible on `http://localhost:8080`.

#### 2. Creating a Weaviate Client

Once installed, initialize a Weaviate client in Python:

```python
import weaviate

client = weaviate.Client("http://localhost:8080")
```

If you're using Weaviate Cloud, replace `localhost` with your Weaviate Cloud endpoint and provide authentication credentials.

### Integrating Weaviate with LangChain

LangChain provides seamless integration with Weaviate for vector-based storage and retrieval. The `Weaviate` wrapper in LangChain simplifies adding and retrieving vector embeddings.

#### 1. Creating a Weaviate Index

Before storing vectors, define a schema in Weaviate:

```python
schema = {
    "classes": [
        {
            "class": "LangChainDocs",
            "vectorizer": "text2vec-openai",
        }
    ]
}

client.schema.create(schema)
```

This creates a collection named `LangChainDocs` with OpenAI’s `text2vec-openai` as the vectorizer.

#### 2. Storing Embeddings in Weaviate

To store vectors, first generate embeddings using an embedding model (e.g., OpenAI or Hugging Face):

```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Weaviate

embeddings = OpenAIEmbeddings()
vector_db = Weaviate(client=client, index_name="LangChainDocs", embeddings=embeddings)
```

Now, store some text data in Weaviate:

```python
documents = ["This is a sample document.", "LangChain makes working with LLMs easier."]
vector_db.add_texts(texts=documents)
```

#### 3. Performing Similarity Search

Retrieve documents similar to a given query:

```python
query = "How does LangChain help with LLMs?"
results = vector_db.similarity_search(query, k=2)

for result in results:
    print(result.page_content)
```

This fetches the top 2 documents that are most semantically similar to the query.

### Best Practices and Optimization

* **Use Efficient Vectorizers**: Choose the right vectorizer (e.g., OpenAI, Cohere, Sentence Transformers) based on your use case.
* **Index Maintenance**: Regularly update and clean up old embeddings to keep the index optimized.
* **Hybrid Search**: Leverage Weaviate’s hybrid search capabilities for a combination of keyword and vector-based retrieval.
* **Cloud Deployment**: For production, consider using Weaviate Cloud for scalability and reliability.

### Conclusion

Weaviate provides a robust, open-source alternative to proprietary vector databases like Pinecone, offering hybrid search capabilities and flexibility. Its integration with LangChain makes it an excellent choice for scalable and efficient AI applications. With proper setup and optimization, you can leverage Weaviate to enhance search, recommendation, and retrieval-augmented generation (RAG) applications.
