SQL : JOINS, Union, Subqueries, CTE

In PostgreSQL, joins are used to retrieve data from multiple tables based on a related column. There are several types of joins, including INNER, LEFT, RIGHT, and FULL (OUTER). Here's a breakdown of each with an example.

Example Setup:

Suppose we have two tables: students and courses.

students table: | id | name | course_id | | --- | ------- | --------- | | 1 | Alice | 101 | | 2 | Bob | NULL | | 3 | Charlie | 102 | | 4 | David | 103 |

courses table: | id | course_name | | --- | --------------- | | 101 | Math | | 102 | Science | | 104 | History |

1. INNER JOIN:

An INNER JOIN returns rows when there is a match in both tables.

SELECT students.name, courses.course_name
FROM students
INNER JOIN courses ON students.course_id = courses.id;

Result: | name | course_name | | ------- | ----------- | | Alice | Math | | Charlie | Science |

Explanation: Only students with a matching course_id in the courses table are included in the result.

2. LEFT JOIN (or LEFT OUTER JOIN):

A LEFT JOIN returns all rows from the left table (students), and the matched rows from the right table (courses). If there's no match, NULL values are returned for the columns from the right table.

SELECT students.name, courses.course_name
FROM students
LEFT JOIN courses ON students.course_id = courses.id;

Result: | name | course_name | | ------- | ----------- | | Alice | Math | | Bob | NULL | | Charlie | Science | | David | NULL |

Explanation: All rows from students are included. For Bob and David, there's no corresponding course, so NULL is returned for course_name.

3. RIGHT JOIN (or RIGHT OUTER JOIN):

A RIGHT JOIN returns all rows from the right table (courses), and the matched rows from the left table (students). If there's no match, NULL values are returned for the columns from the left table.

SELECT students.name, courses.course_name
FROM students
RIGHT JOIN courses ON students.course_id = courses.id;

Result: | name | course_name | | ------- | ----------- | | Alice | Math | | Charlie | Science | | NULL | History |

Explanation: All rows from the courses table are included. For the History course, there's no matching student, so NULL is returned for name.

4. FULL JOIN (or FULL OUTER JOIN):

A FULL JOIN returns all rows when there is a match in either table. If there is no match, NULL values are returned for the missing matches.

SELECT students.name, courses.course_name
FROM students
FULL JOIN courses ON students.course_id = courses.id;

Result: | name | course_name | | ------- | ----------- | | Alice | Math | | Bob | NULL | | Charlie | Science | | David | NULL | | NULL | History |

Explanation: All rows from both students and courses are included. Unmatched rows are filled with NULL.

Summary:

INNER JOIN: Returns matching rows from both tables.
LEFT JOIN: Returns all rows from the left table and matched rows from the right table. Unmatched rows from the right are NULL.
RIGHT JOIN: Returns all rows from the right table and matched rows from the left table. Unmatched rows from the left are NULL.
FULL JOIN: Returns all rows from both tables, with NULL in place of unmatched rows.

These are powerful tools for combining data from multiple tables in relational databases like PostgreSQL.

If you use only the word JOIN without specifying a type, it defaults to an INNER JOIN in PostgreSQL.

For example, the following query:

SELECT students.name, courses.course_name
FROM students
JOIN courses ON students.course_id = courses.id;

is equivalent to:

SELECT students.name, courses.course_name
FROM students
INNER JOIN courses ON students.course_id = courses.id;

This will return only the rows where there is a match in both tables, just like an INNER JOIN.

A self-join is a type of join where a table is joined with itself. This can be useful when you need to compare rows within the same table. To perform a self-join, you use table aliases to differentiate between the instances of the same table.

Example Scenario

Let's consider a table called employees that contains employee information, including their managers.

employees table: | id | name | manager_id | | --- | -------- | ---------- | | 1 | Alice | NULL | | 2 | Bob | 1 | | 3 | Charlie | 1 | | 4 | David | 2 | | 5 | Eve | 2 |

In this table: - Alice is the manager of Bob and Charlie. - Bob is the manager of David and Eve.

Example Query

To find each employee and their manager’s name, you would perform a self-join:

SELECT e1.name AS employee_name, e2.name AS manager_name
FROM employees e1
LEFT JOIN employees e2 ON e1.manager_id = e2.id;

Result: | employee_name | manager_name | | ------------- | ------------ | | Alice | NULL | | Bob | Alice | | Charlie | Alice | | David | Bob | | Eve | Bob |

Explanation

Table Aliases: e1 and e2 are aliases for the same employees table. e1 represents the employee, and e2 represents the manager.
JOIN Condition: e1.manager_id = e2.id specifies how rows from the table should be matched. It joins the rows where the manager_id of e1 matches the id of e2.

Why Use a Self-Join?

Self-joins are particularly useful when you need to:

Compare Rows Within the Same Table: For example, comparing employees with their managers.
Find Relationships or Hierarchies: Like organizational hierarchies, parent-child relationships, etc.
Aggregate Data: Such as finding the average salary of employees who report to a specific manager.

By using a self-join, you can effectively analyze and retrieve data based on relationships within a single table.

In SQL, the UNION operator is used to combine the results of two or more SELECT queries into a single result set. The key points about UNION are:

Combines Results: It combines rows from multiple SELECT queries into a single result set.
Eliminates Duplicates: By default, UNION removes duplicate rows from the result set.
Column Matching: The number of columns and their data types must match in all SELECT queries being combined.

Types of Unions

UNION:
Combines the results of two or more SELECT statements and removes duplicate rows.
Example:

sql SELECT column1, column2 FROM table1 UNION SELECT column1, column2 FROM table2;
UNION ALL:
Combines the results of two or more SELECT statements but does not remove duplicate rows.
Example:

sql SELECT column1, column2 FROM table1 UNION ALL SELECT column1, column2 FROM table2;

Example Scenario

Suppose you have two tables: students_2023 and students_2024.

students_2023 table: | id | name | | --- | -------- | | 1 | Alice | | 2 | Bob |

students_2024 table: | id | name | | --- | -------- | | 2 | Bob | | 3 | Charlie |

Using `UNION`:

SELECT name FROM students_2023
UNION
SELECT name FROM students_2024;

Result: | name | | -------- | | Alice | | Bob | | Charlie |

Explanation: This query combines the names from both tables and removes duplicate values (Bob appears only once).

Using `UNION ALL`:

SELECT name FROM students_2023
UNION ALL
SELECT name FROM students_2024;

Result: | name | | -------- | | Alice | | Bob | | Bob | | Charlie |

Explanation: This query combines the names from both tables without removing duplicates, so Bob appears twice.

Important Considerations

Column Compatibility: Each SELECT query must have the same number of columns, and the data types of the columns must be compatible.
Column Names: The column names in the result set are taken from the first SELECT query.

Use Cases

Combining Results from Similar Tables: For example, if you have different tables for different years and want to get a combined list of entries.
Combining Data from Multiple Sources: For example, merging customer lists from different regions.

The UNION and UNION ALL operators are useful for aggregating data from multiple sources and can be used in various scenarios to consolidate information.

A subquery, also known as a nested query or inner query, is a query within another SQL query. Subqueries are used to perform operations that require multiple steps, such as filtering, aggregation, or complex joins. They allow you to use the result of one query as input for another query.

Types of Subqueries

Single-row Subquery: Returns a single row with a single value.
Multiple-row Subquery: Returns multiple rows.
Correlated Subquery: References columns from the outer query.
Non-correlated Subquery: Independent of the outer query.

Example Scenario

Consider two tables: employees and departments.

employees table: | id | name | department_id | salary | | --- | -------- | ------------- | ------ | | 1 | Alice | 1 | 70000 | | 2 | Bob | 2 | 80000 | | 3 | Charlie | 1 | 90000 | | 4 | David | 3 | 95000 | | 5 | Eve | 2 | 85000 |

departments table: | id | department_name | | --- | --------------- | | 1 | HR | | 2 | Finance | | 3 | IT |

Example 1: Single-row Subquery

Goal: Find the name of the employee with the highest salary.

SELECT name
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);

Explanation: - The inner query (SELECT MAX(salary) FROM employees) finds the highest salary. - The outer query retrieves the name of the employee with that highest salary.

Result: | name | | ------- | | Charlie |

Example 2: Multiple-row Subquery

Goal: Find employees who work in the departments with more than one employee.

SELECT name
FROM employees
WHERE department_id IN (
    SELECT department_id
    FROM employees
    GROUP BY department_id
    HAVING COUNT(*) > 1
);

Explanation: - The inner query finds department_ids where the count of employees is greater than one. - The outer query retrieves the names of employees who work in these departments.

Result: | name | | ------- | | Alice | | Bob | | Charlie | | Eve |

Example 3: Correlated Subquery

Goal: Find employees whose salary is greater than the average salary of their own department.

SELECT name, salary
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e1.department_id = e2.department_id
);

Explanation: - The inner query calculates the average salary within the same department as the current row (e1). - The outer query retrieves employees whose salary is greater than the average salary in their department.

Result: | name | salary | | ------- | ------ | | Charlie | 90000 |

Key Points:

Single-row Subquery: Returns one value.
Multiple-row Subquery: Returns multiple rows.
Correlated Subquery: References the outer query, recalculating for each row.
Non-correlated Subquery: Independent of the outer query.

Subqueries can be very powerful for performing complex queries and filtering data based on conditions that are computed dynamically. They are an essential tool in SQL for handling multi-step data retrieval and manipulation tasks.

Common Table Expressions (CTEs) are a feature in SQL that allows you to create temporary result sets that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement. CTEs are useful for breaking down complex queries into simpler parts, improving readability and maintainability. They can also be used to simplify recursive queries.

Basic Syntax

A CTE is defined using the WITH clause and is followed by a query that defines the CTE. You can then reference this CTE in the main query.

Syntax:

WITH cte_name AS (
    -- CTE Query
    SELECT column1, column2
    FROM table_name
    WHERE condition
)
-- Main Query
SELECT column1, column2
FROM cte_name
WHERE condition;

Example Scenario

Consider a database with an employees table:

employees table: | id | name | department | salary | | --- | -------- | ---------- | ------ | | 1 | Alice | HR | 70000 | | 2 | Bob | Finance | 80000 | | 3 | Charlie | HR | 90000 | | 4 | David | IT | 95000 | | 5 | Eve | Finance | 85000 |

Example 1: Using a CTE to Simplify a Query

Goal: Find the average salary in each department and list employees who earn more than the average salary in their department.

WITH DepartmentAverage AS (
    SELECT department, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department
)
SELECT e.name, e.salary, d.avg_salary
FROM employees e
JOIN DepartmentAverage d ON e.department = d.department
WHERE e.salary > d.avg_salary;

Explanation: - CTE Definition (DepartmentAverage): Calculates the average salary for each department. - Main Query: Joins the employees table with the DepartmentAverage CTE to filter employees earning more than the average salary in their department.

Result: | name | salary | avg_salary | | ------- | ------ | ---------- | | Charlie | 90000 | 70000 | | Eve | 85000 | 80000 |

Example 2: Recursive CTE

Recursive CTEs are used to handle hierarchical or recursive data, such as organizational charts or tree structures.

Example: Suppose we have an employees table with a manager_id column indicating each employee's manager.

employees table: | id | name | manager_id | | --- | -------- | ---------- | | 1 | Alice | NULL | | 2 | Bob | 1 | | 3 | Charlie | 1 | | 4 | David | 2 | | 5 | Eve | 2 |

Goal: List all employees and their direct and indirect reports (recursive hierarchy).

WITH RECURSIVE EmployeeHierarchy AS (
    -- Anchor member: Select top-level employees
    SELECT id, name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive member: Select employees reporting to those in the previous level
    SELECT e.id, e.name, e.manager_id, eh.level + 1
    FROM employees e
    INNER JOIN EmployeeHierarchy eh ON e.manager_id = eh.id
)
SELECT id, name, manager_id, level
FROM EmployeeHierarchy
ORDER BY level, manager_id;

Explanation: - CTE Definition: - Anchor Member: Selects employees with no managers (top-level). - Recursive Member: Selects employees who report to those in the previous level. - Main Query: Retrieves and orders employees based on their hierarchy level.

Result: | id | name | manager_id | level | | --- | -------- | ---------- | ----- | | 1 | Alice | NULL | 1 | | 2 | Bob | 1 | 2 | | 3 | Charlie | 1 | 2 | | 4 | David | 2 | 3 | | 5 | Eve | 2 | 3 |

Advantages of CTEs

Readability: Breaks down complex queries into simpler, more manageable parts.
Reusability: Allows you to reference the CTE multiple times within the same query.
Recursion: Enables recursive queries for hierarchical data.
Maintainability: Makes the SQL code easier to understand and maintain.

CTEs are a powerful feature in SQL that can help simplify complex queries and handle recursive data structures efficiently.

Example Setup:

1. INNER JOIN:

2. LEFT JOIN (or LEFT OUTER JOIN):

3. RIGHT JOIN (or RIGHT OUTER JOIN):

4. FULL JOIN (or FULL OUTER JOIN):

Summary:

Example Scenario

Example Query

Explanation

Why Use a Self-Join?

Types of Unions

Example Scenario

Using UNION:

Using UNION ALL:

Important Considerations

Use Cases

Types of Subqueries

Example Scenario

Example 1: Single-row Subquery

Example 2: Multiple-row Subquery

Example 3: Correlated Subquery

Key Points:

Basic Syntax

Example Scenario

Example 1: Using a CTE to Simplify a Query

Example 2: Recursive CTE

Advantages of CTEs

links

social

Using `UNION`:

Using `UNION ALL`: