Querying data

The process of retrieving or the command to retrieve data from a database is called a query. In SQL the SELECT command is used to specify queries.

SELECT retrieves rows from zero or more tables. The general processing of SELECT is as follows:

  1. All queries in the WITH list are computed. This effectively serve as temporary tables that can be referenced in the FROM list.

  2. All elements in the FROM list are computed. If more than one element is specified in the FROM list, they are cross-joined together.

  3. If WHERE clause is specified, all rows that do not satisfy the condition are eliminated from the output.

  4. If the GROUP BY clause is specified, or if there are aggregate function calls, the output is combined into groups of rows that match on one or more values, and the result of aggregate functions are computed. If the HAVING clause is present, it eliminates groups that does not satisfy the given condition.

  5. The actual output rows are computed using the SELECT output expressions for each row or row of group.

  6. SELECT DISTINCT eliminates duplicate rows from the result. SELECT DISTINCT ON eliminates rows that match on all the specified expressions.

  7. Using the operators UNION, INTERSECT, and EXCEPT, the output of more than one SELECT statement can be combined to form a single result set.

    • The UNION operator returns all rows that are in one or both result sets.

    • The INTERSECT operator returns all rows that are strictly in both result sets.

    • The EXCEPT operator returns the rows that are in the first result set, but not in the second.

  8. If the ORDER BY clause is specified, the returned rows are sorted in the specific order.

  9. if the LIMIT or OFFSET clause is specified, the SELECT statement only returns a subset of the result rows.

  10. If FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE or FOR KEY SHARE is specified, the SELECT statement locks the selected rows against concurrent updates.

Select rows from a table

A simple kind of query has the form:

SELECT * FROM publisher;

This command would retrieve all rows and all user-defined columns from publisher table. A select list can also select a subset of the available columns or make calculations using the columns.

SELECT title, page_count / 2
FROM book;

FROM clause is a simple kind of table expression: it reads just one table. In general, table expression can be complex constructs of base tables, joins and sub-queries. But it is possible to omit the table expression entirely and use the SELECT command as a calculator (or in some another way):

SELECT 40 + 2;

The FROM clause derives a table from one or more other tables given in a comma-separated table reference list. A table reference can a table name, or derived table such as sub-query, a JOIN construct, or complex combinations of these. If more than one table reference is listed in the FROM clause, the table are cross-joined. The result of the FROM list is an intermediate virtual table which can then be subject to transformations by the WHERE, GROUP BY, and HAVING clauses and is finally the result of the overall table expression.

Search conditions

After the processing of the FROM clause is done, each row of the derived virtual table can be checked against the search condition. If the result of the condition is true, the row is kept in the output table, otherwise it is discarded. The WHERE clause is used to handle search condition checks.

Here are some examples of WHERE clause:

WHERE clause usage examples
SELECT ... FROM tbl
WHERE c1 > 5;

SELECT ... FROM tbl
WHERE c1 IN (1, 2, 3);

SELECT ... FROM tbl
WHERE c1 IN (SELECT c1 FROM t2);

SELECT ... FROM tbl
WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = tbl.c1 + 10);

SELECT ... FROM tbl
WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = tbl.c1 + 10) AND 100;

SELECT ... FROM tbl
WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > fdt.c1);

Grouping data

After passing the WHERE filter, the derived input table might be subject to grouping, using GROUP BY clause, and elimination of group rows using the HAVING clause. The GROUP BY clause is used to group together those rows in a table that have the same values in all the columns listed. The order in the columns are listed does not matter. The effect is to combine each set of rows having common values into one group row that represents all rows in the group. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups.

Select lists

As shown is the previous section, the table expression in the SELECT command constructs an intermediate virtual table by possibly combining tables, views, eliminating rows, grouping etc. This table is finally passed on to processing by the select list. The select list determines which columns of the intermediate table are actually output.

Select list items

The simplest kind of select is * which emits that the table expression produces. Otherwise, a select list is a comma-separated list of value expressions. For instance:

SELECT first_name, last_name
FROM author;

Columns labels

The entries listed in the select list can be assigned names for subsequent processing.

SELECT title, page_count / 2 as sheets FROM book;

If no output column name is specified using AS, the system will assign a default column name. For simple column references, this is the name of the referenced column. For function calls, this is the name of the function. For complex expressions, the system will generate a generic name.

Note

Labels are often referenced as “aliases”.

DISTINCT

After the select list has been processed, the result table can optionally be subject to the elimination of duplicate rows. The DISTINCT key word is written directly after SELECT to specify this:

SELECT DISTINCT publisher FROM book;

Obviously, two rows are considered distinct if they differ in at least one column value. Null values are considered equal in comparison. Alternatively, an arbitrary expression can determine what rows are to be considered distinct.

SELECT DISTINCT ON (expression [, expression ...]) select_list ...

Sorting rows

After a query has produced an output table it can optionally be sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in this case will depend on the scan and join plan types and the order on disk, but it must not be relied on.

The ORDER BY clause specifies the sort order:

SELECT title, publisher_id FROM book ORDER BY title;

The sort expression(s) can be any expression that would be valid in the query’s select list.

When more than one expression is specified, the later values are used to sort rows that are equal according to the earlier values. Each expression can be followed by an option ASC or DESC keyword to set the sort direction to ascending or descending. ASC order is the default.

The NULL FIRST and NULL LAST options can be used to determine whether null appear before or after non-null values in the sort ordering. By default, null values sort as if larger than any non-null value; that’s, NULL FIRST is the default for DESC order, and NULL LAST otherwise.

Note that the ordering options are considered independently for each sort column. For example ORDER BY x, y DESC means ORDER BY x ASC, y DESC, which is not the same as ORDER BY x DESC, y DESC.

A sort expression can also be the column label or number of an output column.

ORDER BY can be applied to the result of a UNION, INTERSECT, or EXCEPT combination, but in this case it is only permitted to sort by output column names or numbers, not by expressions.

Limit and Offset

LIMIT and OFFSET allow you to retrieve just a portion of the rows that are generated by the rest of the query.

SELECT * FROM publisher LIMIT 10;

The query above fetches only first 10 rows from the “publisher” table.

If a limit count is given, no more than that many rows will be returned. But possibly fewer, if the query itself yields fewer rows. LIMIT ALL is the same as omitting the LIMIT clause, as is LIMIT with NULL argument.

OFFSET says to skip that many rows before beginning to return rows. OFFSET 0 is the same as omitting the OFFSET clause, as is OFFSET with a NULL argument.

SELECT * FROM publisher LIMIT 10 OFFSET 10;

The query above fetches 10 publishers starting from the 11-th one.

When using LIMIT and/or OFFSET, it’s important to use ORDER BY clause that constraints the result rows into a unique order. Otherwise you will get an unpredictable subset of the query’s rows.

WITH queries

WITH provides a way to write auxiliary statements for use in a larger query. These statements, which are often referred to as CTE, can be thought of as defining temporary tables that exist just for one query. Each auxiliary statement in a WITH clause can be a SELECT, INSERT, UPDATE or DELETE; and the WITH clause itself is attached to a primary statement that can be a SELECT, INSERT, UPDATE, DELETE or MERGE.

Select in WITH

The basic value of SELECT in WITH is to break down complicated queries into simpler parts.

Get most profitable months
-- select top revenue months
WITH monthly_revenue AS (SELECT EXTRACT(YEAR FROM date)  AS year,
                                EXTRACT(MONTH FROM date) AS month,
                                SUM(amount)              AS total_amount
                         FROM revenue
                         GROUP BY year, month
                         ORDER BY year, month)
SELECT *
FROM monthly_revenue
WHERE total_amount > (SELECT SUM(total_amount) / 100 FROM monthly_revenue)
ORDER BY total_amount DESC;

Recursive queries

The optional RECURSIVE modifier changes WITH from a mere syntactic convenience into a feature that accomplishes things not otherwise possible in standard SQL. Using RECURSIVE, a WITH query can refer to its own output.

Sum of first 10 numbers starting from 1
WITH RECURSIVE t(n) AS (VALUES (1) UNION SELECT n + 1 FROM t WHERE n < 10)
SELECT sum(n)
FROM t;

Important

When working with recursive queries it is important to be sure that the recursive part of the query will eventually return no tuple, or else the query will loop indefinitely.

Data modifying statements in WITH

You can use most data-modifying statements (INSERT, UPDATE, DELETE) in WITH. This allows you to perform several different operations at a time, in the same query.

Remove old revenue records and store them into history table
WITH deleted_rows AS (
    DELETE FROM revenue
        WHERE date < '2020-01-01'
        RETURNING *)
INSERT
INTO revenue_log
SELECT *
FROM deleted_rows;

Combining queries

The results of two queries can be combined using the set operations union, intersection, and difference.

query1 UNION [ALL] query2
query1 INTERSECT [ALL] query2
query1 EXCEPT [ALL] query2

UNION effectively appends the result of query2 to the result of query1 (although there is no guarantee that this is the order in which the rows are actually returned). Furthermore, it eliminates duplicate rows from its result, in the same way as DISTINCT, unless UNION ALL is used.

WITH north_lib_member AS
         (SELECT *
          FROM (VALUES (1, 'Mariska', 'Syson'),
                       (2, 'Robenia', 'Dimitriades'),
                       (3, 'Loren', 'Goretti'),
                       (4, 'Dianna', 'Giacomuzzo'),
                       (5, 'Anett', 'Hayland'),
                       (6, 'Keeley', 'Grishelyov')) AS tbl(id, first_name, last_name)),
     south_lib_member AS
         (SELECT *
          FROM (VALUES (5, 'Anett', 'Hayland'),
                       (6, 'Keeley', 'Grishelyov'),
                       (7, 'Say', 'Kliner'),
                       (8, 'Charles', 'Ede'),
                       (9, 'Florentia', 'Core'),
                       (10, 'Tiffany', 'Tingly'))
                   AS tbl(id, first_name, last_name))
SELECT *
FROM north_lib_member
UNION
SELECT *
FROM south_lib_member
ORDER BY id;

id

first_name

last_name

1

Mariska

Syson

2

Robenia

Dimitriades

3

Loren

Goretti

4

Dianna

Giacomuzzo

5

Anett

Hayland

6

Keeley

Grishelyov

7

Say

Kliner

8

Charles

Ede

9

Florentia

Core

10

Tiffany

Tingly

INTERSECT returns all rows that are both in the result of query1 and in the result of query2. Duplicate rows are eliminated unless INTERSECT ALL is used.

WITH north_lib_member AS
         (SELECT *
          FROM (VALUES (1, 'Mariska', 'Syson'),
                       (2, 'Robenia', 'Dimitriades'),
                       (3, 'Loren', 'Goretti'),
                       (4, 'Dianna', 'Giacomuzzo'),
                       (5, 'Anett', 'Hayland'),
                       (6, 'Keeley', 'Grishelyov')) AS tbl(id, first_name, last_name)),
     south_lib_member AS
         (SELECT *
          FROM (VALUES (5, 'Anett', 'Hayland'),
                       (6, 'Keeley', 'Grishelyov'),
                       (7, 'Say', 'Kliner'),
                       (8, 'Charles', 'Ede'),
                       (9, 'Florentia', 'Core'),
                       (10, 'Tiffany', 'Tingly'))
                   AS tbl(id, first_name, last_name))
SELECT *
FROM north_lib_member
INTERSECT
SELECT *
FROM south_lib_member
ORDER BY id;

id

first_name

last_name

5

Anett

Hayland

6

Keeley

Grishelyov

EXCEPT returns all rows that are in the result of query1 but not in the result of query2. (This is sometimes called the difference between two queries.) Again, duplicates are eliminated unless EXCEPT ALL is used.

WITH north_lib_member AS
         (SELECT *
          FROM (VALUES (1, 'Mariska', 'Syson'),
                       (2, 'Robenia', 'Dimitriades'),
                       (3, 'Loren', 'Goretti'),
                       (4, 'Dianna', 'Giacomuzzo'),
                       (5, 'Anett', 'Hayland'),
                       (6, 'Keeley', 'Grishelyov')) AS tbl(id, first_name, last_name)),
     south_lib_member AS
         (SELECT *
          FROM (VALUES (5, 'Anett', 'Hayland'),
                       (6, 'Keeley', 'Grishelyov'),
                       (7, 'Say', 'Kliner'),
                       (8, 'Charles', 'Ede'),
                       (9, 'Florentia', 'Core'),
                       (10, 'Tiffany', 'Tingly'))
                   AS tbl(id, first_name, last_name))
SELECT *
FROM north_lib_member
EXCEPT
SELECT *
FROM south_lib_member
ORDER BY id;

id

first_name

last_name

1

Mariska

Syson

2

Robenia

Dimitriades

3

Loren

Goretti

4

Dianna

Giacomuzzo

Join data from different tables

A JOIN clause combines two FROM items, which for convenience we will refer as “tables”, though in reality they can be any type of FROM items. Available join types are:

  • [INNER] JOIN

  • LEFT [OUTER] JOIN

  • RIGHT [OUTER] JOIN

  • FULL [OUTER] JOIN

Note

Key words in bracket parentheses above are optional.

Joins between tables

Publishers and books tables definitions
CREATE TABLE publisher
(
    id      INTEGER PRIMARY KEY,
    name    VARCHAR(128) NOT NULL CHECK (LENGTH(name) > 0),
    website VARCHAR(255) UNIQUE,
    email   VARCHAR(255),
    phone   VARCHAR(32) UNIQUE
);

CREATE TABLE book
(
    id               SERIAL PRIMARY KEY,
    title            VARCHAR(255) NOT NULL,
    synopsis         TEXT,
    isbn             VARCHAR(16),
    publisher_id     INTEGER      NOT NULL REFERENCES publisher ON DELETE RESTRICT,
    publication_date DATE,
    language         VARCHAR(64),
    page_count       INTEGER,
    keywords         TEXT
);

Following tables are related via FOREIGN KEY constraint. Each record in the book table references some record in the publisher table.

Important

Since original tables contain lots of data, temporary tables will be created instead before join query. Full tables content may be downloaded via links: publisher, book.

The subsets from original queries defined as:

SELECT *
FROM (VALUES (1, 'Zoonoodle'),
             (3, 'Tanoodle'),
             (4, 'Skivee'))
         AS publisher(id, name);

id

name

1

Zoonoodle

3

Tanoodle

SELECT *
FROM (VALUES (22, 'pretium nisl', 2),
             (38, 'ac est lacinia nisi venenatis', 3),
             (69, 'vel accumsan tellus nisi', 3),
             (72, 'nam dui proin', 2),
             (85, 'nam tristique', 2))
         AS book(id, title, publisher_id);

id

title

publisher_id

22

pretium nisl

2

38

ac est lacinia nisi venenatis

3

69

vel accumsan tellus nisi

3

72

nam dui proin

2

85

nam tristique

2

By passing multiple tables to the FROM clause a CROSS JOIN query is executed. This means the resulting table is the Cartesian product of two original tables.

Cross join (Cartesian product)
WITH publisher AS
         (SELECT *
          FROM (VALUES (1, 'Zoonoodle'),
                       (3, 'Tanoodle'),
                       (4, 'Skivee'))
              AS publisher(id, name)),
     book AS
         (SELECT *
          FROM (VALUES (22, 'pretium nisl', 2),
                       (38, 'ac est lacinia nisi venenatis', 3),
                       (69, 'vel accumsan tellus nisi', 3),
                       (72, 'nam dui proin', 2),
                       (85, 'nam tristique', 2))
                   AS book(id, title, publisher_id))
-- querying the joined result
SELECT *
FROM book,
     publisher;
Query result

id

title

publisher_id

id

name

22

pretium nisl

2

1

Zoonoodle

22

pretium nisl

2

3

Tanoodle

22

pretium nisl

2

4

Skivee

38

ac est lacinia nisi venenatis

3

1

Zoonoodle

38

ac est lacinia nisi venenatis

3

3

Tanoodle

38

ac est lacinia nisi venenatis

3

4

Skivee

69

vel accumsan tellus nisi

3

1

Zoonoodle

69

vel accumsan tellus nisi

3

3

Tanoodle

69

vel accumsan tellus nisi

3

4

Skivee

72

nam dui proin

2

1

Zoonoodle

72

nam dui proin

2

3

Tanoodle

72

nam dui proin

2

4

Skivee

85

nam tristique

2

1

Zoonoodle

85

nam tristique

2

3

Tanoodle

85

nam tristique

2

4

Skivee

By defining the WHERE clause rows that do not match the condition may be eliminated from the result.

SELECT *
FROM book, publisher
WHERE book.publisher_id = publisher.id;

id

title

publisher_id

id

name

38

ac est lacinia nisi venenatis

3

3

Tanoodle

69

vel accumsan tellus nisi

3

3

Tanoodle

This is called INNER JOIN, which means that the result table will contain only whose rows from left-hard and right-hand tables, that have corresponding records to each other.

The same result may be obtained using JOIN clause. All JOIN options are just a notation convenience, since they do nothing you couldn’t do with plain FROM and WHERE.

SELECT *
FROM book b
         JOIN publisher p ON b.publisher_id = p.id;

INNER JOIN is the default while joining two tables with each other.

Right and left joins

LEFT OUTER JOIN returns all rows in the qualified Cartesian product (i.e., all combined rows that pass its condition), plus one copy of each row in the left-hand table for which there was no right-hand row that passed the join condition.

SELECT *
FROM book b
         LEFT JOIN publisher p ON b.publisher_id = p.id;

id

title

publisher_id

id

name

22

pretium nisl

2

null

null

38

ac est lacinia nisi venenatis

3

3

Tanoodle

69

vel accumsan tellus nisi

3

3

Tanoodle

72

nam dui proin

2

null

null

85

nam tristique

2

null

null

Conversely, RIGHT OUTER JOIN returns all the joined rows, plus one row for each unmatched right-hand row. This is just a notation convenience, since you could convert it to a LEFT OUTER JOIN by switching the left and right tables.

SELECT *
FROM book b
         RIGHT JOIN publisher p ON b.publisher_id = p.id;

id

title

publisher_id

id

name

38

ac est lacinia nisi venenatis

3

3

Tanoodle

69

vel accumsan tellus nisi

3

3

Tanoodle

null

null

null

4

Skivee

null

null

null

1

Zoonoodle

Full outer join

SELECT *
FROM book b
         FULL JOIN publisher p ON b.publisher_id = p.id;

id

title

publisher_id

id

name

22

pretium nisl

2

null

null

38

ac est lacinia nisi venenatis

3

3

Tanoodle

69

vel accumsan tellus nisi

3

3

Tanoodle

72

nam dui proin

2

null

null

85

nam tristique

2

null

null

null

null

null

4

Skivee

null

null

null

1

Zoonoodle