Interacting with Generative AI Language Models Locally

Generative AI: LLM Locally

One of the most fascinating breakthroughs has been in the realm of generative AI, particularly those specialized in text. These innovative models, like the artist with a blank canvas, craft sentences, paragraphs, and stories, stitching together words in ways that were once the exclusive domain of human intellect. No longer just tools for querying databases or executing commands, these AI are akin to novelists, poets, and playwrights,furthermore, they are equipped with the ability to program and even create full projects on their own.

GPT-3 and GPT-4 from OpenAI are two of the most well-known LLMs. They are both large language models with billions of parameters, and they can be used for a variety of tasks, such as generating text, translating languages, and writing different kinds of creative content.
PaLM (Pathways Language Model) is a 540 billion parameter LLM from Google AI. It is one of the largest LLMs ever created, and it can perform a wide range of tasks, including question answering, coding, and natural language inference.
LaMDA (Language Model for Dialogue Applications) is a 137 billion parameter LLM from Google AI. It is designed specifically for dialogue applications, such as chatbots and virtual assistants.
Chinchilla is a 300 billion parameter LLM from DeepMind. It is one of the most efficient LLMs available, and it can be used for a variety of tasks, such as machine translation and text summarization.
LLaMA (Large Language Model Meta AI) is a 65 billion parameter LLM from Meta AI. It is designed to be more accessible than other LLMs, and it is available in smaller sizes that require less computing power. Llama has also spawned a number of open source derivatives:
- Vicuna is a 33 billion parameter LLM that is based on LLaMA. It is fine-tuned on a dataset of human conversations, and it is designed for dialogue applications.
- Orca is a 13 billion parameter LLM that is based on LLaMA. It is designed to be efficient and easy to use, and it can be used for a variety of tasks, such as text generation, translation, and question answering.
- Guanaco is a family of LLMs that are based on LLaMA. They come in a variety of sizes, from 7 billion to 65 billion parameters. They are designed for a variety of tasks, such as machine translation, question answering, and natural language inference.
More Open Source models? Have a look as well to Falcon

While the promise of this technology sounds almost like science fiction and there’s considerable hype surrounding it, there’s truly no better way to understand its capabilities than to experience it firsthand. So, why merely read about it when you can delve into its intricate workings yourself?

Let’s demystify the buzz and see what these models are genuinely capable of. In this post, I’ll guide you on how to interact with these state-of-the-art LLM models locally, and the best part? You can do it for free and using just the CPU.

Installing LLMs locally: Vicuna

We need an interface to use our LLMs and there is a perfect project that uses a Gradio Web UI.

In general the instructions of this projects work and we can replicate it fairly easy, but I thought to simplify the dependencies setup with Docker. So what you will need is:

Install Docker
Install Portainer - This will make easier the container management.
Have a look to the fantastic OObabooga’s Project
- The container built is based on the instructions of this repo
Decide which LLM Model you want to use: have a look to huggingface

The Docker Container: TextGenerationWebUI

I already created the container and pushed it to dockerhub to avoid the quite long waiting time of dependencies installations etc.

The docker-compose / Stack to use in Portainer is as simple as this:

version: '3'

services:
  genai_text:
    image: fossengineer/oobabooga_cpu
    container_name: genai_ooba
    ports:
      - "7860:7860"
    working_dir: /app
    command: tail -f /dev/null #keep it running
    volumes: #Choose your way
     # - C:/Path/to/Models/AI/Docker_Vol:/app/text-generation-webui/models
     # - /home/AI_Local:/app/text-generation-webui/models
     # - appdata_ooba:/app/text-generation-webui/models

# volumes:
#   appdata_ooba:

This will spin up a docker container with Python and Oobabooga’s Web UI dependencies already installed.

Inside this container, we just miss one thing, the LLM models: for that, download it in your PC and setup the proper Bind volume in the docker yml file above, so that the container is able to see the .bin files.

Adding a LLM Model

We can try for example with Vicuna.

Go to HuggingFace and download one of the models: https://huggingface.co/eachadea/ggml-vicuna-7b-1.1/tree/main
- I tried it with ggml-vic7b-uncensored-q5_1.bin
Deploy the yml above with the folder in your system that contains the .bin file
Then execute: conda init bash
Restart the interactive terminal and execute the following

conda activate textgen 
cd text-generation-webui
#python server.py
python server.py --listen

With those commands we activated the conda textgen environment, then navigated to the folder where all the action happens and execute the Python server (when doing it inside a docker container we need the –listen flag)

FAQ

How to try Safely LLMs with Docker?

You can use a Python container and install the dependencies in a fresh environment with:

version: '3'

services:
  my-python-app:
    image: python:3.8-slim
    container_name: python-dev
    command: tail -f /dev/null
    volumes:
      - python_dev:/app
    working_dir: /app  # Set the working directory to /app
    ports:
      - "8501:8501"

volumes:
  python_dev:

Generative AI: LLM Locally#

Installing LLMs locally: Vicuna#

The Docker Container: TextGenerationWebUI#

Adding a LLM Model#

FAQ#

How to try Safely LLMs with Docker?#