So you know some SQL already.

And now yo want to build AI Apps fast.

You need to get to know about MindsDB, which helps us to create AI Tools that need realtime data to perform their tasks.

The MindsDB Project

You can find MindsDB project details and source code at:

You can give Minds DB a try with Vector DBs, for example with your SelfHosted ChromaDB

MindsDB Integrations - ML & LLMs

There are several ways to integrate MindsDB:

SelfHosting MindsDB with Docker

First Things First - Get Docker! πŸ‹

Important step and quite recommended for any SelfHosting Project - Get Docker Installed

It will be one command, this one, if you are in Linux:

apt-get update && sudo apt-get upgrade && curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh && docker version

As per the Docs, we can run it with Docker CLI:

docker run -p 47334:47334 -p 47335:47335 mindsdb/mindsdb

But for proper SelfHosting and Docker Container Management, lets SelfHost mindsdb with docker-compose:

version: '3.9'

services:
  chroma:
    container_name: mindsdb-container
    image: mindsdb/mindsdb
    ports:
      - "47334:47334"
    volumes:
      - mindsdb_data:/mindsdb

volumes:
  mindsdb_data:
    driver: local

Then, just go to: http://localhost:47334

How to use MindsDB

Currently there are +100 Sources to use with MindsDB.

MindsDB - Web Crawler

The primary purpose of a web crawler is to collect data from the internet for various purposes, such as search engine indexing, content scraping, website analysis, and more.

With MindsDB, we can use a Web Crawler and get web data to train models, domain specific chatbots or fine-tune LLMs.

Initialize a web crawler:

CREATE DATABASE my_web 
WITH ENGINE = 'web';

Get content from a Web:

SELECT * 
FROM my_web.crawler 
WHERE url = 'docs.mindsdb.com' 
LIMIT 1;
#LIMIT 10; #10 internal pages
Few More Tricks with MindsDB Web Crawler πŸ‘‡
  • Or from multiple WebSites
SELECT * 
FROM my_web.crawler 
WHERE url IN ('docs.mindsdb.com', 'docs.python.org') 
LIMIT 1;
  • Even PDF Content
SELECT * 
FROM my_web.crawler 
WHERE url = '<link-to-pdf-file>' 
LIMIT 1;

FAQ

Other F/OSS Ways to Check Sentiment Analysis?

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚑ Pytorch Lightning and πŸ€— Transformers. For access to our API, please email us at [email protected].

NER

NER stands for Named Entity Recognition.

It is a subtask of information extraction that seeks to locate and classify named entities mentioned in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

NER is not directly used for detecting Personally Identifiable Information (PII) or for sentiment analysis, but it can be an important component in those tasks. For instance:

  1. PII Detection: While NER itself does not specifically detect PII, it can identify entities like names, addresses, or other specifics that might be considered PII. Additional rules or models are typically needed to specifically classify data as PII.

  2. Sentiment Analysis: NER is generally not used directly in sentiment analysis. Sentiment analysis focuses on determining the attitude or emotion conveyed in a piece of text, such as positive, negative, or neutral sentiments. However, understanding what entities are being discussed can provide context that might be useful in a more nuanced analysis of sentiment.

NER is a foundational NLP tool that helps in structuring text for deeper analysis, which can be useful in tasks like PII detection and sentiment analysis, but it is not solely sufficient for these tasks.

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024

A very simple framework for state-of-the-art Natural Language Processing (NLP)