The SQL Guide

SQL is a programming language used for managing and manipulating data in databases. It is used for creating, updating, and deleting data in databases, as well as for querying data from databases.

It can be used to retrieve data from multiple tables, perform calculations on the data, and update or delete data in a database. SQL can also be used to create stored procedures, triggers, and views, and is an essential tool for data analysis, data migration, and data integration.

SQL Concepts

DDL - Data Definition Language

Is a subset of SQL. It is a language for describing data and its relationships in a database:

  • CREATE : to create objects in database
  • ALTER : alters the structure of database
  • DROP : delete objects from database (REMOVES THE ENTIRE TABLE AND STRUCTURE)
  • TRUNCATE: removes the data (rows) but preserves the structure
  • RENAME : rename an objects

DML - Data Manipulation Language

Used for adding (inserting), deleting, and modifying (updating) data in a database:

  • SELECT: retrieve data from the database
  • INSERT: insert data into a table
  • UPDATE: update existing data within a table
  • DELETE: deletes all records from a table, space for the records remain

DCL - Data Control Language

Used to control access to data stored in a database (authorization):

  • GRANT: allow specified users to perform specified tasks.
  • REVOKE: cancel previously granted or denied permissions.

TCL - Transaction Control Language

  • COMMIT: Commit command is used to permanently save any transaction into the database.
  • ROLLBACK: This command restores the database to last committed state. It is also used with savepoint command to jump to a

SQL Quick Guide

Filtering

SELECT * FROM Clients
WHERE Name='Yosua';
SELECT * FROM Clients
WHERE Name='Yosua' AND Nick='Cerdo' OR Job='Dr.Yosua';

Aggregations

SELECT Region, COUNT(CustomerID)
FROM Clients
GROUP BY Region
ORDER BY COUNT(CustomerID) DESC;

Unions

UNION combines the results of two or more SELECT queries vertically (appending rows), while a JOIN combines rows from two or more tables horizontally (adding columns) based on a related column between them.

The UNION operation is used to combine the results of two or more SELECT queries into a single result set. It eliminates duplicate rows from the results. Each SELECT query within the UNION must have the same number of columns and those columns must have similar data types. The columns in each SELECT statement must also be in the same order. The UNION operator is used when you want to combine rows from similar tables or datasets, essentially stacking the results vertically.

For example, if you have two tables - “Sales2019” and “Sales2020” - with the same structure, and you want to create a list of all unique customers in both years, you could use a UNION to do this.

SELECT CustomerName FROM Sales2019
UNION
SELECT CustomerName FROM Sales2020;

The Joins

But what are joins?

HAVING AND WHERE

You can use WHERE to prefilter the input that your HAVING statement will find. Remember that where will not work with aggregated functions and it will be having the go to when we want to filter the output from a group by.

SELECT Region, COUNT(Boquitas), AVG(Cerdeza)
FROM Clients
WHERE Job like 'Dr.%'
GROUP BY Region
HAVING COUNT(Boquitas) > 5
ORDER BY COUNT(Boquitas) DESC;

So, using both clauses in the same query allows you to apply filtering conditions both before and after aggregation, providing you with precise control over the data you retrieve.

Window Functions

A SQL window function is a function that operates on a set of rows within a result set, also known as a window. Window functions are often used to calculate running totals, moving averages, and other aggregations over a set of rows.

Window functions are similar to aggregate functions, but they operate on a subset of the rows in the result set, rather than the entire result set. This makes them more flexible than aggregate functions, as they can be used to calculate aggregations over a specific range of rows, such as the previous 10 rows or the next 20 rows.

Window functions are also more efficient than aggregate functions, as they only need to calculate the aggregation for the rows in the window, rather than for the entire result set.

SUM -> Rolling Sum

SELECT o.occurred_at,
       SUM(o.gloss_qty) OVER(ORDER BY o.occurred_at) as running_gloss_orders
  FROM demo.orders o

ROW_NUMBER -> Get the first element from every group

  WITH
	order_ranks as (
	   SELECT o.id,
	          o.account_id,
	          o.gloss_qty,
	          o.gloss_amt_usd,
	          ROW_NUMBER() OVER(PARTITION BY o.account_id 
                           ORDER BY o.occurred_at DESC) as acct_order_rank
      FROM demo.orders o
)

	SELECT *
	 FROM  order_ranks
    WHERE  acct_order_rank = 1

LAG -> Events occurred before

 SELECT o.id,
            o.occurred_at,
	        o.gloss_qty,
	        LAG(gloss_qty,1) OVER(ORDER BY o.occurred_at) as order_lag_1,
	        LAG(gloss_qty,2) OVER(ORDER BY o.occurred_at) as order_lag_2,
	        LAG(gloss_qty,3) OVER(ORDER BY o.occurred_at) as order_lag_3
     FROM   demo.orders o

CTE

In SQL, a WITH statement, also known as a Common Table Expression (CTE), is used to define a temporary result set that can be referred to within the context of a SELECT, INSERT, UPDATE, or DELETE statement. Common Table Expressions provide a way to break down complex queries into more manageable, named, and often recursive parts.

The basic syntax of a WITH statement or CTE is as follows:

WITH cte_name (column1, column2, ...) AS (
    -- CTE query definition
    SELECT ...
    FROM ...
    WHERE ...
)
-- Main query that uses the CTE
SELECT ...
FROM cte_name
WHERE ...
  • cte_name: This is the name you give to the Common Table Expression. It’s like a temporary table name that you can use within the current SQL statement.

  • (column1, column2, …): You can optionally specify the column names that the CTE will return. This is useful if you want to define and name specific columns in the CTE.

  • AS: This keyword is used to introduce the CTE query definition.

  • CTE Query Definition: This is where you define the SELECT statement that forms the CTE. It can reference other tables and CTEs, and it’s evaluated only once, creating a temporary result set that can be used in subsequent parts of the query.

  • Main Query: After defining the CTE, you can use it within the main query. You reference the CTE by its name (cte_name) as if it were an actual table or subquery. This allows you to build complex queries step by step.

WITH CategorySales AS (
    SELECT
        product_category,
        SUM(sales_amount) AS total_sales
    FROM sales
    GROUP BY product_category
)
SELECT *
FROM CategorySales
WHERE total_sales > 10000;

Subqueries

A subquery, also known as a nested query, is a query nested inside another SQL query. Subqueries are used to retrieve data based on the results of another query. Subqueries can appear in various parts of a SQL statement, including the SELECT clause, FROM clause, WHERE clause, and HAVING clause. Here’s an example of a subquery in the WHERE clause:

Let’s say we have two tables: employees and departments. The employees table contains information about employees, including their department, and the departments table contains information about departments, including their budget.

Suppose you want to retrieve all employees from the “Sales” department whose salaries are greater than the average salary in the “Sales” department. You can use a subquery for this purpose:

SELECT employee_name, department, salary
FROM employees
WHERE department = 'Sales' AND salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department = 'Sales'
);

In this example:

  1. The main query retrieves employee information for the “Sales” department (WHERE department = 'Sales').

  2. Inside the WHERE clause of the main query, there’s a subquery (SELECT AVG(salary) FROM employees WHERE department = 'Sales'). This subquery calculates the average salary of employees in the “Sales” department.

  3. The main query compares each employee’s salary to the result of the subquery (salary > ...). It retrieves employees whose salary is greater than the average salary in the “Sales” department.

The subquery is executed for each row returned by the main query, allowing you to filter the main query’s results based on dynamic calculations using data from the same or related tables.

Subqueries can be powerful tools for extracting data based on conditions or calculations involving other rows or tables, and they are a fundamental part of SQL for complex data retrieval and manipulation tasks.

FAQ

What are the uses of Indexes in SQL? Pros & Cons

Indexes in SQL are used to improve the performance of database queries by allowing for efficient data retrieval. They provide a way to quickly locate specific rows in a table based on the values of one or more columns. Here are some uses, pros, and cons of using indexes in SQL:

  • Uses of Indexes:

    • Faster Data Retrieval: Indexes speed up the query execution process by reducing the number of disk I/O operations required to locate the desired data.
    • Improved Query Performance: Indexes help optimize query plans, allowing the database engine to execute queries more efficiently.
    • Constraint Enforcement: Indexes can be used to enforce unique constraints and primary key constraints, ensuring the integrity of data.
  • Pros of Indexes:

    • Faster Query Execution: Indexes can significantly speed up the execution of SELECT statements, especially when dealing with large tables.
    • Improved Data Search: Indexes enable quick data lookup based on specific column values, making search operations more efficient.
    • Enhanced Sorting and Grouping: Indexes can improve the performance of sorting and grouping operations in queries.
  • Cons of Indexes:

    • Increased Storage Space: Indexes require additional storage space to store the index data structures. This can be a concern for large databases or tables with many indexes.
    • Overhead on Write Operations: When data is modified (inserted, updated, or deleted), indexes must be updated accordingly, which adds overhead to write operations and can impact overall performance.
    • Index Maintenance: Indexes need to be maintained and periodically rebuilt or optimized to ensure optimal performance. This maintenance operation can consume system resources and may impact database availability during the process.

It’s important to note that the decision to create an index should be based on careful analysis of the specific workload and query patterns of your application. Over-indexing or creating unnecessary indexes can have negative consequences on overall performance. Therefore, it’s crucial to strike the right balance between indexing and query optimization based on the specific needs of your application.

Optimizing SQL Queries

Optimizing SQL queries can help to improve the performance of a database and the applications that use it.

  • Some best practices for SQL query optimization are:
    • Use WHERE clauses instead of HAVING clauses to define filters.
    • Use SELECT instead of SELECT *.
    • Avoid running queries in a loop.
    • Simplify joins.
    • Use indexes whenever possible.
    • An index consists of a data structure that stores a sorted copy of selected columns from a table.
    • It includes a reference to the original table’s rows, which makes it possible to quickly locate specific rows that match a given search criterion.
    • Faster Data Retrieval: Indexes allow the database engine to quickly locate the rows that satisfy a query’s WHERE clause conditions.
    • Efficient Sorting: Indexes can speed up sorting operations (e.g., ORDER BY clauses) because the data is already sorted in the index.
    • Enhanced JOIN Performance: Indexes can improve the performance of JOIN operations when joining multiple tables.
    • Avoid using multiple OR conditions in the WHERE clause.
    • Avoid using subqueries in the WHERE clause.
    • Avoid using cursors and temporary tables.
    • Avoid using functions in the WHERE clause.
    • Use the EXPLAIN command to view the query execution plan
    • Use Appropriate Naming Convention
    • Keep Wild cards at the End of Phrases
    • Avoid Cursors
      • A cursor is a temporary work area created in the system memory when a SQL statement is executed. A cursor is a set of rows together with a pointer that identifies a current row. It is a database object to retrieve data from a result set one row at a time. But the use of a cursor is not good because it takes a long time because it fetches data row by row.
      • So we can use a replacement of cursors. A Temporary table, For or While loop may be a replacement for a cursor in some cases.
    • Union vs Union ALL:
      • When using the UNION statement, keep in mind that, by default, it performs the equivalent of a SELECT DISTINCT on the final result set. In other words, UNION takes the results of two like recordsets, combines them, and then performs a SELECT DISTINCT to eliminate duplicate rows. This process occurs even if there are no duplicate records in the final record set.
      • If you know there are duplicate records, which presents a problem for your application, then use the UNION statement to eliminate the duplicate rows.
      • On the other hand, if you know that there will never be any duplicate rows, or if there are, and this presents no problem to your application, then you should use the UNION ALL statement instead of the UNION statement. The advantage of the UNION ALL is that it does not perform the SELECT DISTINCT function, which saves a lot of unnecessary SQL Server resources from being used.