Whishper is an awsome project built on top of LibreTranslate, that we can SelfHost to make local audio transcription.

The Whishper Project

Whishper - Open-source, local-first audio transcription and subtitling with UI

SelfHosting Whishper

Whishper CPU Only

You will need the docker-compose configuration and the .env to specify your variables

You can always have a look at Whishper Official Docs

Whishper Docker-Compose

version: "3.9"

services:
  mongo:
    image: mongo
    env_file:
      - .env
    restart: unless-stopped
    volumes:
      - ./whishper_data/db_data:/data/db
      - ./whishper_data/db_data/logs/:/var/log/mongodb/
    environment:
      MONGO_INITDB_ROOT_USERNAME: ${DB_USER:-whishper}
      MONGO_INITDB_ROOT_PASSWORD: ${DB_PASS:-whishper}
    expose:
      - 27017
    command: ['--logpath', '/var/log/mongodb/mongod.log']

  translate:
    container_name: whisper-libretranslate
    image: libretranslate/libretranslate:latest
    restart: unless-stopped
    volumes:
      - ./whishper_data/libretranslate/data:/home/libretranslate/.local/share
      - ./whishper_data/libretranslate/cache:/home/libretranslate/.local/cache
    env_file:
      - .env
    tty: true
    environment:
      LT_DISABLE_WEB_UI: True
      LT_UPDATE_MODELS: True
    expose:
      - 5000
    networks:
      default:
        aliases:
          - translate
    healthcheck:
      test: ['CMD-SHELL', './venv/bin/python scripts/healthcheck.py']
      interval: 2s
      timeout: 3s
      retries: 5

  whishper:
    pull_policy: always
    image: pluja/whishper:${WHISHPER_VERSION:-latest}
    env_file:
      - .env
    volumes:
      - ./whishper_data/uploads:/app/uploads
      - ./whishper_data/logs:/var/log/whishper
    container_name: whishper
    restart: unless-stopped
    networks:
      default:
        aliases:
          - whishper
    ports:
      - 8082:80
    depends_on:
      - mongo
      - translate
    environment:
      PUBLIC_INTERNAL_API_HOST: "http://127.0.0.1:80"
      PUBLIC_TRANSLATION_API_HOST: ""
      PUBLIC_API_HOST: ${WHISHPER_HOST:-}
      PUBLIC_WHISHPER_PROFILE: cpu
      WHISPER_MODELS_DIR: /app/models
      UPLOAD_DIR: /app/uploads
      CPU_THREADS: 4

Whishper .env

# Libretranslate Configuration
## Check out https://github.com/LibreTranslate/LibreTranslate#configuration-parameters for more libretranslate configuration options
LT_LOAD_ONLY=es,en,fr

# Whisper Configuration
WHISPER_MODELS=tiny,small
WHISHPER_HOST=http://127.0.0.1:8082

# Database Configuration
DB_USER=whishper
DB_PASS=whishper

Just configure your desired variables and deploy with:

docker-compose up -d

Your Whishper instance will be waiting for you at http://localhost:8082


FAQ

Other F/OSS Audio Transcription Tools

How to Install AudioCraft?

  1. Get Python installed
  2. Get Familiar with Virtual Environments
  3. Installing AudioCraft with Python Step by Step 馃憞
    python3 -m venv audiocraft
    source audiocraft/bin/activate
    
    
    apt install ffmpeg
    
    
    git clone https://github.com/facebookresearch/audiocraft.git ./audio
    cd audio
    python -m pip install -r requirements.txt
    
    pip install rich
    
    python -m demos.musicgen_app --share
    
    #deactivate
    

And the Gradio UI will be available at http://localhost:7860

How to install Bark?

Testing Bark with Python Step by Step 馃憞
#pip install --upgrade setuptools wheel

pip install git+https://github.com/suno-ai/bark.git

pip install git+https://github.com/huggingface/transformers.git

#pip install IPython

Try Bark Audio Generation with:

python -m bark --text "Hello, my name is Suno." --output_filename "example.wav"

or directly in Python:

from transformers import AutoProcessor, BarkModel

processor = AutoProcessor.from_pretrained("suno/bark")
model = BarkModel.from_pretrained("suno/bark")

voice_preset = "v2/en_speaker_6"

inputs = processor("Hello, my dog is cute", voice_preset=voice_preset)

audio_array = model.generate(**inputs)
audio_array = audio_array.cpu().numpy().squeeze()

import scipy

sample_rate = model.generation_config.sample_rate
scipy.io.wavfile.write("bark_out.wav", rate=sample_rate, data=audio_array)