So you know some SQL already.
And now yo want to build AI Apps fast.
You need to get to know about MindsDB, which helps us to create AI Tools that need realtime data to perform their tasks.
The MindsDB Project
You can find MindsDB project details and source code at:
- The MindsDB Site
- The MindsDB Source Code at Github
- The Docker Container to deploy MindsDB
- Mixed License: ELv2 and MIT β
You can give Minds DB a try with Vector DBs, for example with your SelfHosted ChromaDB
MindsDB Integrations - ML & LLMs
There are several ways to integrate MindsDB:
- LLMs: Talking about LLMs, Free Models and locally, we are lucky to have the simplest way
- Ollama - Setup Ollama locally with Docker
- With VectorDBs:
- LanceDB, Pinecone, Qdrant…
- Time Series DBs:
- InfluxDB - F/OSS and you can deploy it with Docker. Plays well with IoT and Grafana.
- And more: https://docs.mindsdb.com/integrations/data-integrations/all-data-integrations
SelfHosting MindsDB with Docker
First Things First - Get Docker! π
Important step and quite recommended for any SelfHosting Project - Get Docker Installed
It will be one command, this one, if you are in Linux:
apt-get update && sudo apt-get upgrade && curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh && docker version
As per the Docs, we can run it with Docker CLI:
docker run -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
But for proper SelfHosting and Docker Container Management, lets SelfHost mindsdb with docker-compose:
version: '3.9'
services:
chroma:
container_name: mindsdb-container
image: mindsdb/mindsdb
ports:
- "47334:47334" #UI Port
volumes:
- mindsdb_data:/mindsdb
volumes:
mindsdb_data:
driver: local
Then, just go to: http://localhost:47334
How to use MindsDB
Currently there are +100 Sources to use with MindsDB.
MindsDB - Web Crawler
The primary purpose of a web crawler is to collect data from the internet for various purposes, such as search engine indexing, content scraping, website analysis, and more.
With MindsDB, we can use a Web Crawler and get web data to train models, domain specific chatbots or fine-tune LLMs.
Initialize a web crawler:
CREATE DATABASE my_web
WITH ENGINE = 'web';
Get content from a Web:
SELECT *
FROM my_web.crawler
WHERE url = 'docs.mindsdb.com'
LIMIT 1;
#LIMIT 10; #10 internal pages
Few More Tricks with MindsDB Web Crawler π
- Or from multiple WebSites
SELECT *
FROM my_web.crawler
WHERE url IN ('docs.mindsdb.com', 'docs.python.org')
LIMIT 1;
- Even PDF Content
SELECT *
FROM my_web.crawler
WHERE url = '<link-to-pdf-file>'
LIMIT 1;
FAQ
Other F/OSS Ways to Check Sentiment Analysis?
- You can have a look the Detoxify Project
- It uses Pytorch and Transformers
- Apache v2 Licensed β
Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using β‘ Pytorch Lightning and π€ Transformers. For access to our API, please email us at [email protected].
NER
NER stands for Named Entity Recognition.
It is a subtask of information extraction that seeks to locate and classify named entities mentioned in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
NER is not directly used for detecting Personally Identifiable Information (PII) or for sentiment analysis, but it can be an important component in those tasks. For instance:
-
PII Detection: While NER itself does not specifically detect PII, it can identify entities like names, addresses, or other specifics that might be considered PII. Additional rules or models are typically needed to specifically classify data as PII.
-
Sentiment Analysis: NER is generally not used directly in sentiment analysis. Sentiment analysis focuses on determining the attitude or emotion conveyed in a piece of text, such as positive, negative, or neutral sentiments. However, understanding what entities are being discussed can provide context that might be useful in a more nuanced analysis of sentiment.
NER is a foundational NLP tool that helps in structuring text for deeper analysis, which can be useful in tasks like PII detection and sentiment analysis, but it is not solely sufficient for these tasks.
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
- https://github.com/flairNLP/flair
- https://github.com/flairNLP/flair?tab=License-1-ov-file#readme MIT Licensed
A very simple framework for state-of-the-art Natural Language Processing (NLP)