Why PrivateGPT?

PrivateGPT provides an API (a tool for computer programs) that has everything you need to create AI applications that understand context and keep things private.

It’s like a set of building blocks for AI. This API is designed to work just like the OpenAI API, but it has some extra features.

So, if you’re already using the OpenAI API in your software, you can switch to the PrivateGPT API without changing your code, and it won’t cost you any extra money.

You can use PrivateGPT with CPU only if you want

Forget about expensive GPU’s if you dont have one already πŸ’―

The PrivateGPT Project

PrivateGPT allow us to chat with our documents locally and without an internet connection.

How to use PrivateGPT?

  • Setting Up PrivateGPT: Choose Your Path πŸ› οΈ

    • Manual Setup πŸ”:
      1. Detailed Documentation: Follow the official documentation to install all dependencies manually.
  • Docker-based Setup 🐳:

  1. Streamlined Process: Opt for a Docker-based solution to use PrivateGPT for a more straightforward setup process.

SelfHosting PrivateGPT

I have tried those with some other project and they worked for me 90% of the time, probably the other 10% was me doing something wrong.

To make sure that the steps are perfectly replicable for anyone, I bring you a guide with PrivateGPT & Docker to contain all the Dependencies (and make it work 100% of the times).

  • To ensure that the steps are perfectly replicable for anyone, I’ve created a guide on using PrivateGPT with Docker to contain all dependencies and make it work flawlessly 100% of the time.
    • 🐳 Follow the Docker image setup guide for quick setup here.
    • πŸ’‘ Alternatively, learn how to build your own Docker image to run PrivateGPT locally, which is the recommended approach. You can find the guide here.

With this approach, you will need just one thing to setup PrivateGPT locally: get Docker Installed

PrivateGPT with Docker

  1. Get Docker: Ensure that Docker is installed on your machine.
  1. Choose Your Docker Image:

    • Use a Pre-built Image: Download and run a pre-built Docker image suitable for your project.
    • Build Your Own Image: Alternatively, you can build your own Docker image from a Dockerfile tailored to your specific requirements.
  2. Configure NGINX for HTTPS: (Optional)

Then, use the following Stack / Docker configuration file to deploy it:

That’s it, now get your favourite LLM model ready and start using it with the PrivateGPT UI at: localhost:8001

Succesfully PrivateGPT Local Installation with Docker

Remember that you can use CPU mode only if you dont have a GPU (It happens to me as well).

Just remember to use models compatible with llama.cpp, as the project suggests .

PrivateGPT API

PrivateGPT API is OpenAI API (ChatGPT) compatible, this means that you can use it with other projects that require such API to work.

How to Build your PrivateGPT Docker Image

The best way (and secure) to SelfHost PrivateGPT. Build your own Image.

You will need a Dockerfile.

But dont worry, here you have a sample Dockerfile to build your own PrivateGPT Docker Image.

It’s not magic, just some automation to make PrivateGPT work without much effort.

The setup script will download these 2 models by default:

Use GGUF format for the models and it will be fine (llama.cpp related)

And then build your Docker image to run PrivateGPT with:

docker build -t privategpt .
#podman build -t privategpt .

#docker tag privategpt docker.io/fossengineer/privategpt:v1 #example I used
#docker push docker.io/fossengineer/privategpt:v1

docker-compose up -d #to spin the container up with CLI

Using your PrivateGPT Docker Image

You will need Docker installed and use the Docker-Compose Stack below.

If you are not very familiar with Docker, don’t be scared and install Portainer to deploy the container with GUI .

When the server is started it will print a log Application startup complete.

Wait for the initial PrivateGPT setup to complete

Execute the comand make run in the container:

docker exec -it privategpt make run

Navigate to http://localhost:8002 to use the Gradio UI or to http://localhost:8002/docs (API section) to try the API using Swagger UI.

PrivateGPT UI Locally


FAQ

Other F/OSS Alternatives to have Local LLMs

What is happening inside PrivateGPT?

These are the guts of our PrivateGPT beast.

The Embedding Model will create the vectorDB records of our documents and then, the LLM will provide the replies for us.

  • Embedding Model - BAAI/bge-small-en-v1.5
  • Conversational Model (LLM) - TheBloke/Mistral 7B
  • VectorDBs - PrivateGPT uses QDrant (F/OSS βœ…)
  • RAG Framework - PrivateGPT uses LLamaIndex (yeap, also F/OSS βœ…)

You can check and tweak this default options with the settings.yaml file.

Python for PrivateGPT

What are Gradio Apps?

Gradio is an open-source Python library that simplifies the development of interactive machine learning (ML) and natural language processing (NLP) applications.

You can also run Gradio-Lite (JS) - Serverless Gradio running in the browser. Similarly we can do with Transformers.JS

Python Dependencies 101

When we are sharing software, we need to make sure that our PCs have the same libraries/packages installed.

Same applies when we try some Python coding

How does PrivateGPT compare to Ollama?

Ollama is a model runner β€” it serves an OpenAI-compatible API and you build the rest. PrivateGPT is a complete app: ingestion pipeline, vector store (Qdrant), RAG framework (LlamaIndex), and a Gradio UI. Want to chat with your own PDFs out of the box? PrivateGPT. Want to wire LLMs into your own application? Ollama (plus Dify AI or Flowise AI on top).

What model size makes sense on CPU?

7B with 4-bit quantization (Q4_K_M GGUF) is the sweet spot β€” fast enough to feel responsive on a modern laptop. Mistral 7B Instruct, Llama 3 8B, and Phi-3 Medium all run well. 13B works but is noticeably slower. Anything larger and you really want a GPU.

How do I add my own documents?

In the PrivateGPT UI, use the Ingest data tab β€” it accepts PDFs, Markdown, plain text, HTML, and a few more formats. Behind the scenes, the embedding model (BAAI/bge-small-en-v1.5 by default) converts each chunk to a vector and stores it in Qdrant. The chat interface then retrieves relevant chunks per question.

How do I expose PrivateGPT safely?

Behind Nginx Proxy Manager for HTTPS, or via Cloudflare Tunnel without opening router ports. The Gradio UI has no built-in auth β€” add it at the proxy layer. Combine with Uptime Kuma to keep an eye on availability.

Can I swap out the LLM and embedding model?

Yes β€” both are configured in settings.yaml. Drop a different GGUF model into the models folder and update the path. Same for the embedding model (anything compatible with the SentenceTransformers loader works). Re-ingestion is required if you change the embedding model.