# 10. pgvector

## Using pgvector with LangChain

### Introduction to pgvector

pgvector is an extension for PostgreSQL that enables efficient storage and retrieval of high-dimensional vector embeddings. It is optimized for similarity search, making it ideal for AI applications such as recommendation systems, semantic search, and retrieval-augmented generation (RAG). Since pgvector is built on PostgreSQL, it benefits from SQL-based query capabilities and seamless integration with existing databases.

### Setting Up pgvector

#### 1. Installing pgvector

To use pgvector, install the extension in your PostgreSQL database:

```sql
CREATE EXTENSION IF NOT EXISTS vector;
```

If you haven't installed PostgreSQL with pgvector, you can install it via:

```bash
pip install psycopg2-binary
```

#### 2. Creating a pgvector Client

Once installed, initialize a PostgreSQL connection with pgvector in Python:

```python
import psycopg2

conn = psycopg2.connect(
    dbname="your_db",
    user="your_user",
    password="your_password",
    host="localhost",
    port=5432
)
cursor = conn.cursor()
```

Replace the connection details with your database credentials.

### Integrating pgvector with LangChain

LangChain provides seamless integration with pgvector for vector-based storage and retrieval. The `pgvector` wrapper in LangChain simplifies adding and retrieving vector embeddings.

#### 1. Creating a pgvector Table

Before storing vectors, define a table schema in PostgreSQL:

```sql
CREATE TABLE langchain_docs (
    id SERIAL PRIMARY KEY,
    embedding vector(1536),
    text TEXT
);
```

#### 2. Storing Embeddings in pgvector

To store vectors, first generate embeddings using an embedding model (e.g., OpenAI or Hugging Face):

```python
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import PGVector

embeddings = OpenAIEmbeddings()
vector_db = PGVector(connection=conn, table_name="langchain_docs", embeddings=embeddings)
```

Now, store some text data in pgvector:

```python
documents = ["This is a sample document.", "LangChain makes working with LLMs easier."]
vector_db.add_texts(texts=documents)
```

#### 3. Performing Similarity Search

Retrieve documents similar to a given query:

```python
query = "How does LangChain help with LLMs?"
results = vector_db.similarity_search(query, k=2)

for result in results:
    print(result.page_content)
```

This fetches the top 2 documents that are most semantically similar to the query.

### Best Practices and Optimization

* **Indexing**: Use PostgreSQL's HNSW or IVFFlat indexing for improved search performance.
* **Scalability**: Leverage PostgreSQL’s scalability features for handling large-scale embeddings.
* **Hybrid Search**: Combine SQL filtering with vector search for better precision.
* **Cloud Deployment**: Consider hosting PostgreSQL with pgvector on cloud providers like AWS RDS or Google Cloud SQL.

### Conclusion

pgvector is a robust and SQL-compatible vector database extension designed for AI-driven applications. Its integration with LangChain enables efficient storage and retrieval of embeddings, making it an excellent choice for scalable search, recommendations, and retrieval-augmented generation (RAG) applications.
