Its been couple of years that GenAI is being pretty much active among all of us.

What will you need to follow?

Actually just some time to read throught this.

Being familiar with Containers and how to use them will be beneficial.

Lets see how to setup LibreChat, which we will compare with the other methods to run LLMs locally.

The LibreChat Project

F/OSS Enhanced ChatGPT Clone: Features Agents, Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, Code Interpreter, langchain, DALL-E-3, OpenAPI Actions, Functions, Secure Multi-User Auth, Presets, open-source for self-hosting.

Terminal Based

There are some ways to interact with LLMs that are done via CLI.

Ollama

Ollama is my one of my go to’s whenever I want to selfHost LLMs, as it is pretty similar to containers:

You can get help of UI Container management tools, like Portainer:

Successfully deploying OLLama with Docker

Gpt4All

GPT4All is an awsome project to run LLMs locally

Elia

Elia is a full TUI app that runs in your terminal though so it’s not as light-weight as llm-term, but it uses a SQLite database and allows you to continue old conversations.

Choosing Model with Elia

More AI CLI LLMs

Remember that the Tools can be open, but the LLMs involved propietary

Octogen

Octogen is an Open-Source Code Interpreter Agent Framework

python3 -m venv llms #create it

llmcli\Scripts\activate #activate venv (windows)
source llms/bin/activate #(linux)

#deactivate #when you are done

With Python and OpenAI

Remember that OpenAI is a closed source LLM!

Yet the python APi to use it is OSS:

pip install openai
import os
from openai import OpenAI

client = OpenAI(
    # This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Say this is a test",
        }
    ],
    model="gpt-3.5-turbo",
)

LLM - One Shot

No Memory for previous message

pip install llm
llm keys set openai

llm models
llm models default
llm models default gpt-4o

Now chat with your model with:

llm "Five cute names for a pet penguin"

You can leverage it with pipes:

llm "What do you know about Python?" > sample.mdx

It also work with local models thanks to the GPT4All Plugin - https://github.com/simonw/llm-gpt4all

python-prompt-toolkit

It saves the entire conversation in-memory while you’re running it (every time you start a session using llm-term). However each “chat session” starts fresh and doesn’t store context from old “conversations”.

Library for building powerful interactive command line applications in Python


With UI

Open Web UI

This project was renamed! I had a look to it as ex ollama web ui

Using Ollama with Web UI

Thanks to noted.lol for the heads up on the project rename

koboldcpp

KoboldCpp is another project I reviewed previously.

KoboldCpp, an easy-to-use AI text-generation software for GGML and GGUF models.

LocalAI

PrivateGPT

PrivateGPT allow us to chat with our documents locally and without an internet connection, thanks to local LLMs.

It uses LLamaIndex as RAG!

Succesfully PrivateGPT Local Installation with Docker

What is happening inside PrivateGPT?

These are the guts of our PrivateGPT beast.

The Embedding Model will create the vectorDB records of our documents and then, the LLM will provide the replies for us.

  • Embedding Model - nomic-ai/nomic-embed-text-v1.5
  • Conversational Model (LLM) - lmstudio-community/Meta-Llama-3.1-8B
  • VectorDBs - PrivateGPT uses QDrant (F/OSS ✅)
  • RAG Framework - PrivateGPT uses LLamaIndex (yeap, also F/OSS ✅)

You can check and tweak this default options with the settings.yaml file.

OObabooga - TextGenWebUI

This gradio TextGenWebUI project to interact with LLMs was the first I tried last year. And it can work with containers:

The Gradio app will wait for you at: localhost:7860

TextGenWebUI Local UI

Conclusions

As of today, there are many ways to use LLMs locally.

And most of them work in regular hardware (without crazy expensive GPUs).

One of the clear use cases is of course to use Gen AI to code, which hopefully will bring us more open source apps to SelfHost!

Also, you can leverage AI for research tasks with scrapping:

What about fully open sourced LLMs?

Have a look to 360LLM - https://www.llm360.ai/ a community driven AGI OSS project.

They aim to make the e2e LLM trainning process transparent and reproducible.

Thats what OSS is all about, right?

Using LLMs to Code

As you know, Im not a developer.

But AI has been helping me a lot the last years.

You can try using VSCodium with the Tabby extension and these open LLMs behind it.


FAQ

How to use Local AI with NextCloud

Nextcloud is…Amazing

Nextcloud AI Assistant - local, privacy-respecting, and fully open source

AI Concepts to get Familiar

RAGs

Vector DBs

  • ChromaDB (🔍): An open-source vector database for efficient management of unstructured data.
  • Vector Admin (🛠️): A user-friendly administration interface for managing vector databases and more.
  • And much more! (🔗): Explore additional F/OSS vector databases for AI projects.

Prompts Engineering Tricks

This is a scheleton system prompt that normally works well for me:

But have a look to this Prompt Engineering Guide

#python3 pyopen.py > output.mdx

import os
from dotenv import load_dotenv
from openai import OpenAI  # pip install openai==1.30.5

# Load environment variables from the .env file
load_dotenv()

# Get the OpenAI API key from the environment variables
api_key = os.getenv("OPENAI_API_KEY")

# Initialize the OpenAI client
client = OpenAI(
    api_key=api_key,
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are an expert business analyst and will provide a detailed jira ticket content with the sections: context, story, acceptance criteria and references.",
        },
        {"role": "user", "content": "I need you to create a development ticket that: will improve current version of our component. "}

    ],
    model="gpt-4o-mini",
    temperature=0.3,
)

# print(chat_completion)
# Extract and print the content of the completed message
completed_message = chat_completion.choices[0].message.content
print(completed_message)
#python3 pyopen.py > output.mdx

import os
from dotenv import load_dotenv
from openai import OpenAI  # pip install openai==1.30.5

# Load environment variables from the .env file
load_dotenv()

# Get the OpenAI API key from the environment variables
api_key = os.getenv("OPENAI_API_KEY")

# Initialize the OpenAI client
client = OpenAI(
    api_key=api_key,
)

mtg_must_have= """
* Purpose - Every meeting should have an objective—the reason why you're having the meeting. Before you schedule a meeting be sure you know what it is that you want out of the meeting.
* Agenda - An agenda outlines the plan for the meeting and lets participants know what to expect. It allows them to be prepared so they can actively participate and bring their expertise to the table.  
* preparation - Before the meeting all participants should take some time to review the agenda and prepare any questions they may have. 
"""

system_reminder = """
Never start your answers with "As an AI language model" when responding to questions.

No disclaimer is needed in any situation.

Write using simple language so a grandma could understand.

Use a conversational tone, be informal and approachable. Write like how people in real life would talk. A bit of slang is okay but don't overdo it.

If you don’t know the answer, just say you don’t know.

Your answers should be on point, succinct and useful. Each response should be written with maximum usefulness in mind rather than being polite.

If something seems out of reach, don’t ask the user to do it; instead, try to work through all your available options first.

When solving problems, take a breath and tackle them step by step.

My career depends on you giving me a good answer

Speak only what needs to be said. Keep the responses brief and to the point, avoid extra words and overly long explanations.
"""

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": f"""You are an expert meeting assistant. Very aware of the following:
                              {mtg_must_have} 
                            Remember also, that: {system_reminder}
                        """,
        },
        {"role": "user", "content": "Who are you and what can you do?"}

    ],
    model="gpt-4o-mini",
    temperature=0.3,
)

# Extract and print the content of the completed message
completed_message = chat_completion.choices[0].message.content
print(completed_message)
Zero Shot Examples
  1. Think step by step.

  2. Explain [topic] starting by simple and easy terms that any beginner can understand.Then level up and continue with a paragraph to intermediate level, then advance. Try to not repeat the same sentences.

  3. Compare [options 1,2,3] starting by simple and easy terms that any beginner can understand.Then level up and continue with a paragraph to intermediate level, then advance. Try to not repeat the same sentences and include pros and cons of the options

  4. Explain the topic of sales and branding for data analytics in 4/5 paragraphs starting by simple and easy terms that any beginner can understand. Then level up and continue with a paragraph to intermediate level, then advance. Try to not repeat the same sentences.

  5. Content Creation:

    • Write me a seo tittle better than: Crypto 101 - The basics
    • Tell me a 100 char description for that blog post about xxx for data analytics and also a 50 words summary (seo friendly)
    • Write an engaging introduction paragraph for a blog post about : how to build your brand for data analytics, also a cool title for seo
  6. I am writing a linkedin article and i need some call to actions (CTA) so that people visit my blog posts on the following topics: website creation

  7. Analyze the writing style from the text below and write a 200 word piece on [topic]

  8. make a title for the blog post of xxx - a description (seo friendly google 80 chars) and summary 50 words. Also make the first paragraph user friendly

Zero-Shot vs. Few-Shot

What is the purpose of fine-tuning prompts in working with language models?

To help the model understand the task better and provide a more accurate response

Interesting LLM Tools