Generative AI: LLM Locally

One of the most fascinating breakthroughs has been in generative AI, particularly those specialized in text.

These innovative models, like the artist with a blank canvas, craft sentences, paragraphs, and stories, stitching together words in ways that were once the exclusive domain of human intellect.

No longer just tools for querying databases or executing commands, these AI are akin to novelists, poets, and playwrights,furthermore, they are equipped with the ability to program and even create full projects on their own.

  • GPT-3 and GPT-4 from OpenAI are two of the most well-known LLMs.

    • They are both large language models with billions of parameters, and they can be used for a variety of tasks, such as generating text, translating languages, and writing different kinds of creative content.
  • PaLM (Pathways Language Model) is a 540 billion parameter LLM from Google AI. It is one of the largest LLMs ever created, and it can perform a wide range of tasks, including question answering, coding, and natural language inference.

While the promise of this technology sounds almost like science fiction and there’s considerable hype surrounding it, there’s truly no better way to understand its capabilities than to experience it firsthand.

So, why merely read about it when you can delve into its intricate workings yourself?

Let’s demystify the buzz and see what these models are genuinely capable of.

In this post, I’ll guide you on how to interact with these state-of-the-art LLM models locally, and the best part? You can do it for free and using just the CPU.

Using LLMs Locally

We need an interface to use our LLMs and there is a perfect project that uses a Gradio Web UI.

What it is Gradio

Gradio is a fantastic open-source Python library designed to streamline the creation of user interfaces (UIs) for various purposes.

Share delightful machine learning apps, all in Python

The Text Generation Web UI Project

In general the instructions of this project work and we can replicate it fairly easy, but I thought to simplify the dependencies setup with Docker.

So what you will need is:

SelfHosting TextGenerationWebUI

I already created the container and pushed it to dockerhub to avoid the quite long waiting time of dependencies installations etc.

With CPU, you will probably need this pytorch version:

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu 

The docker-compose / Stack to use in Portainer is as simple as this and we get TextGenWebUI

#version: '3'

services:
  genai_text:
    image: fossengineer/oobabooga_cpu
    container_name: genai_ooba
    ports:
      - "7860:7860"
    working_dir: /app
    command: tail -f /dev/null #keep it running
    volumes: #Choose your way
     # - C:/Path/to/Models/AI/Docker_Vol:/app/text-generation-webui/models
     # - /home/AI_Local:/app/text-generation-webui/models
     # - appdata_ooba:/app/text-generation-webui/models

# volumes:
#   appdata_ooba:     

This will spin up a docker container with Python and Oobabooga’s Web UI dependencies already installed.

The Gradio UI to interact with LLMs is now ready at: localhost:7860

Inside this container, we just miss one thing, the LLM models: for that, download it in your PC and setup the proper Bind volume in the docker yml file above, so that the container is able to see the .bin files.

Adding a LLM Model

  • You can Try with GGUF models are a single file and should be placed directly into models.
    • GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML
    • Thanks to https://github.com/ggerganov/llama.cpp you can convert from .HF/.GGML/Lora to .gguf
  • The remaining model types (like 16-bit transformers models and GPTQ models) are made of several files and must be placed in a subfolder.

TextGenWebUI Local UI


FAQ

Ways to Evaluate LLMs

How to try Safely LLMs with Docker?

You can use a Python container and install the dependencies in a fresh environment with:

#version: '3'

services:
  my-python-app:
    image: python:3.11-slim
    container_name: python-dev
    command: tail -f /dev/null
    volumes:
      - python_dev:/app
    working_dir: /app  # Set the working directory to /app
    ports:
      - "8501:8501"

volumes:
  python_dev:

A Detailed Video to use TextGenWebUI with Docker