KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models.

It’s a single self contained distributable from Concedo, that builds off llama.cpp, and adds a versatile Kobold API endpoint,

The KoboldCPP Project

Installing koboldcpp

Check latest releases: https://github.com/LostRuins/koboldcpp/releases/

wget https://github.com/LostRuins/koboldcpp/releases/download/v1.58/koboldcpp-linux-x64
#curl -fLo koboldcpp https://github.com/LostRuins/koboldcpp/releases/latest/download/koboldcpp-linux-x64 && chmod +x koboldcpp-linux-x64

./koboldcpp-linux-x64

Select the model, for example you can download: https://huggingface.co/eachadea/ggml-vicuna-7b-1.1/tree/main or https://huggingface.co/TheBloke/dolphin-2.5-mixtral-8x7b-GGUF/tree/main

http://localhost:5001

Trying a MoE LLM

mixtral-8x7b - The original one is dolphin.

you can read more at: https://mistral.ai/news/mixtral-of-experts/

https://huggingface.co/TheBloke/dolphin-2.5-mixtral-8x7b-GGUF/tree/main


FAQ

Why Cpp and not Python?

  • Performance: C++ typically offers better performance than Python due to its lower-level nature and more direct control over hardware resources. For computationally intensive AI tasks, especially those involving large datasets or complex algorithms, C++ can provide significant speed advantages.

  • Development Time: Python is often favored for its simplicity and ease of development. It offers concise syntax, dynamic typing, and extensive libraries (such as TensorFlow, PyTorch, and scikit-learn) that make it convenient for prototyping and experimenting with AI models. In contrast, C++ development may require more time and effort due to its stricter syntax and manual memory management.

  • Portability: Python’s high-level nature and platform independence make it more portable than C++, which is typically compiled to machine code specific to the target platform. Python code can run on various platforms without modification, whereas C++ code may need to be recompiled for different platforms.