So you want to create applications with Vector Databases.

Yes, Apps with Gen AI. Even better, with local open source LLMs and custom data (local databases).

You already had a look to projects like PrivateGPT which use embedding and conversational model and know you wonder how to manage those VectorDBs with that local knowledge.

Keep reading if You want to be one of the firsts to use LLMs with your Private Knowledge Base.

VectorDBs and LLMs

Vector databases store and manage data in the form of vectors. Each vector represents a data point in a multidimensional space.

What? Basically data like text, images, or audio is converted into a numerical vector form using models (like neural networks). These embeddings capture the essence or features of the data.

VectorDBs excel in searching for similar items. For example, given an image embedding, a vector database can quickly find the most similar images in its storage.

Same applies to text, where we can get semanticaly similar text results.

Why VectorDBs?

  • Handling Complex Data: Ideal for applications dealing with non-traditional data types like images, audio, and natural language.
  • Scalability: They can efficiently handle large-scale datasets, crucial for machine learning and big data applications.
  • Speed and Accuracy: Provide fast and accurate results for similarity searches, crucial for recommendation systems, image retrieval, etc.
  • AI and Machine Learning Projects: Useful for students working on AI projects, as they often involve dealing with embeddings.

How to use VectorDBs?

We can SelfHost many F/OSS Vector Databases with Docker, but the here point is - How to properly manage the content of such DBs?

We are lucky enough to have VectorAdmin (also F/OSS project) which allow us to manage VectorDBs with UI.

Consider VectorAdmin our frontend fro VectorDBs - Embedd your knowledge once and manage it with UI.

Good news is that can get started pretty quick with VectorAdmin: The frontend of vector databases.

SelfHosting VectorAdmin with Docker

To make sure that it works for any of you. I prepared this SelfHosting Setup of VectorAdmin with Docker.

Pre-Requisites - Get Docker 🐋

Important step and quite recommended for any SelfHosting Project - Get Docker Installed

It will be one command, this one, if you are in Linux:

apt-get update && sudo apt-get upgrade && curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh && docker version

The Steps that we need are:

git clone git@github.com:Mintplex-Labs/vector-admin.git ./vector-admin
cd vector-admin
cd docker
cp .env.example .env. #and adjust it

Once you have adjusted the .env, lets build our VectorDB Docker image:

sudo docker-compose up -d --build vector-admin

Now its time to relax, and enjoy your GUI for vector DB’s like: Qdrant, ChromaDB or Pinecone


FAQ

F/OSS Vector DBs for AI Projects?

ChromaDB

ChromaDB is a vector database tailored for efficient storage and retrieval of high-dimensional data.

The AI-native open-source Embedding Database. You will see it everywhere from now. ANd yes, you can SelfHost ChromaDB

  • Key Features:

    • Optimized for Similarity Search: Specializes in nearest neighbor search, crucial for tasks like image or voice recognition.
    • High Scalability: Can handle large datasets, which is essential for machine learning and AI-based applications.
  • Use Cases: Suited for applications that need efficient similarity search in large vector datasets, such as facial recognition systems, audio fingerprinting, etc.

  • The ChromaDB Site
  • The ChromaDB Source Code at Github

More F/OSS VectorDBs 👇

While primarily a search engine, it can be used as a vector database with its dense_vector datatype and KNN search capabilities.

Milvus

An open-source vector database designed for scalable similarity search and AI applications.

Qdrant

A vector search engine that is optimized for storing and searching large volumes of vector data.

Faiss

By Facebook AI: Primarily a library for efficient similarity search, but can be used in conjunction with databases to handle vector data.

  • The faiss Site
  • The faiss Source Code at Github
    • License: MIT ❤️
Pinecone

A scalable vector database service, though not entirely open source, it offers a free tier that can be useful for students.

LanceDB

LanceDB is a vector database that focuses on providing high performance for both ingestion and querying of vector data.

  • Key Features:
    • Efficient Indexing: It uses advanced indexing techniques to handle large-scale vector data efficiently.
    • Real-time Processing: Designed for real-time data processing, making it suitable for applications that require immediate insights from vector data.
  • Use Cases: Ideal for scenarios where both high-speed data ingestion and querying are critical, such as real-time recommendation systems, image retrieval systems, etc.

Weaviate

Weaviate is an open-source smart vector search engine that allows for storage and retrieval of high-dimensional vector data.

  • Key Features:
    • Semantic Search: Integrates machine learning models to enable semantic search capabilities.
    • GraphQL API: Offers a GraphQL interface for querying, making it accessible and easy to integrate into various applications.
    • Scalable Architecture: Designed to scale horizontally, facilitating the management of large datasets.
  • Use Cases: Particularly useful for developers building applications that require semantic understanding and context-aware searching, like advanced search engines, recommendation systems, etc.