Querying data
The process of retrieving or the command to retrieve data from a database is
called a query. In SQL the SELECT
command is used to specify queries.
SELECT
retrieves rows from zero or more tables. The general processing of
SELECT
is as follows:
All queries in the
WITH
list are computed. This effectively serve as temporary tables that can be referenced in theFROM
list.All elements in the
FROM
list are computed. If more than one element is specified in theFROM
list, they are cross-joined together.If
WHERE
clause is specified, all rows that do not satisfy the condition are eliminated from the output.If the
GROUP BY
clause is specified, or if there are aggregate function calls, the output is combined into groups of rows that match on one or more values, and the result of aggregate functions are computed. If theHAVING
clause is present, it eliminates groups that does not satisfy the given condition.The actual output rows are computed using the
SELECT
output expressions for each row or row of group.SELECT DISTINCT
eliminates duplicate rows from the result.SELECT DISTINCT ON
eliminates rows that match on all the specified expressions.Using the operators
UNION
,INTERSECT
, andEXCEPT
, the output of more than oneSELECT
statement can be combined to form a single result set.The
UNION
operator returns all rows that are in one or both result sets.The
INTERSECT
operator returns all rows that are strictly in both result sets.The
EXCEPT
operator returns the rows that are in the first result set, but not in the second.
If the
ORDER BY
clause is specified, the returned rows are sorted in the specific order.if the
LIMIT
orOFFSET
clause is specified, theSELECT
statement only returns a subset of the result rows.If
FOR UPDATE
,FOR NO KEY UPDATE
,FOR SHARE
orFOR KEY SHARE
is specified, theSELECT
statement locks the selected rows against concurrent updates.
Select rows from a table
A simple kind of query has the form:
SELECT * FROM publisher;
This command would retrieve all rows and all user-defined columns from
publisher
table. A select list can also select a subset of the available
columns or make calculations using the columns.
SELECT title, page_count / 2
FROM book;
FROM
clause is a simple kind of table expression: it reads just one table.
In general, table expression can be complex constructs of base tables, joins
and sub-queries. But it is possible to omit the table expression entirely and
use the SELECT
command as a calculator (or in some another way):
SELECT 40 + 2;
The FROM
clause derives a table from one or more other tables given in
a comma-separated table reference list. A table reference can a table name,
or derived table such as sub-query, a JOIN
construct, or complex
combinations of these. If more than one table reference is listed in the
FROM
clause, the table are cross-joined. The result of the FROM
list
is an intermediate virtual table which can then be subject to transformations
by the WHERE
, GROUP BY
, and HAVING
clauses and is finally the
result of the overall table expression.
Search conditions
After the processing of the FROM
clause is done, each row of the derived
virtual table can be checked against the search condition. If the result of
the condition is true
, the row is kept in the output table, otherwise it
is discarded. The WHERE
clause is used to handle search condition checks.
Here are some examples of WHERE
clause:
SELECT ... FROM tbl
WHERE c1 > 5;
SELECT ... FROM tbl
WHERE c1 IN (1, 2, 3);
SELECT ... FROM tbl
WHERE c1 IN (SELECT c1 FROM t2);
SELECT ... FROM tbl
WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = tbl.c1 + 10);
SELECT ... FROM tbl
WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = tbl.c1 + 10) AND 100;
SELECT ... FROM tbl
WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > fdt.c1);
Grouping data
After passing the WHERE
filter, the derived input table might be subject
to grouping, using GROUP BY
clause, and elimination of group rows using
the HAVING
clause. The GROUP BY
clause is used to group together those
rows in a table that have the same values in all the columns listed. The order
in the columns are listed does not matter. The effect is to combine each set of
rows having common values into one group row that represents all rows in the
group. This is done to eliminate redundancy in the output and/or compute
aggregates that apply to these groups.
Select lists
As shown is the previous section, the table expression in the SELECT
command constructs an intermediate virtual table by possibly combining tables,
views, eliminating rows, grouping etc. This table is finally passed on to
processing by the select list. The select list determines which columns
of the intermediate table are actually output.
Select list items
The simplest kind of select is *
which emits that the table expression
produces. Otherwise, a select list is a comma-separated list of value
expressions. For instance:
SELECT first_name, last_name
FROM author;
Columns labels
The entries listed in the select list can be assigned names for subsequent processing.
SELECT title, page_count / 2 as sheets FROM book;
If no output column name is specified using AS
, the system will assign
a default column name. For simple column references, this is the name of
the referenced column. For function calls, this is the name of the function.
For complex expressions, the system will generate a generic name.
Note
Labels are often referenced as “aliases”.
DISTINCT
After the select list has been processed, the result table can optionally be
subject to the elimination of duplicate rows. The DISTINCT
key word is
written directly after SELECT
to specify this:
SELECT DISTINCT publisher FROM book;
Obviously, two rows are considered distinct if they differ in at least one column value. Null values are considered equal in comparison. Alternatively, an arbitrary expression can determine what rows are to be considered distinct.
SELECT DISTINCT ON (expression [, expression ...]) select_list ...
Sorting rows
After a query has produced an output table it can optionally be sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in this case will depend on the scan and join plan types and the order on disk, but it must not be relied on.
The ORDER BY
clause specifies the sort order:
SELECT title, publisher_id FROM book ORDER BY title;
The sort expression(s) can be any expression that would be valid in the query’s select list.
When more than one expression is specified, the later values are used to sort
rows that are equal according to the earlier values. Each expression can be
followed by an option ASC
or DESC
keyword to set the sort direction
to ascending or descending. ASC
order is the default.
The NULL FIRST
and NULL LAST
options can be used to determine whether
null appear before or after non-null values in the sort ordering. By default,
null values sort as if larger than any non-null value; that’s, NULL FIRST
is the default for DESC
order, and NULL LAST
otherwise.
Note that the ordering options are considered independently for each sort
column. For example ORDER BY x, y DESC
means ORDER BY x ASC, y DESC
,
which is not the same as ORDER BY x DESC, y DESC
.
A sort expression can also be the column label or number of an output column.
ORDER BY
can be applied to the result of a UNION
, INTERSECT
, or
EXCEPT
combination, but in this case it is only permitted to sort by
output column names or numbers, not by expressions.
Limit and Offset
LIMIT
and OFFSET
allow you to retrieve just a portion of the rows that
are generated by the rest of the query.
SELECT * FROM publisher LIMIT 10;
The query above fetches only first 10 rows from the “publisher” table.
If a limit count is given, no more than that many rows will be returned.
But possibly fewer, if the query itself yields fewer rows.
LIMIT ALL
is the same as omitting the LIMIT
clause, as is LIMIT
with NULL
argument.
OFFSET
says to skip that many rows before beginning to return rows.
OFFSET 0
is the same as omitting the OFFSET
clause, as is OFFSET
with a NULL
argument.
SELECT * FROM publisher LIMIT 10 OFFSET 10;
The query above fetches 10 publishers starting from the 11-th one.
When using LIMIT
and/or OFFSET
, it’s important to use ORDER BY
clause that constraints the result rows into a unique order. Otherwise you
will get an unpredictable subset of the query’s rows.
WITH queries
WITH
provides a way to write auxiliary statements for use in a larger
query. These statements, which are often referred to as
CTE, can be thought of as defining temporary
tables that exist just for one query. Each auxiliary statement in a WITH
clause can be a SELECT
, INSERT
, UPDATE
or DELETE
; and the
WITH
clause itself is attached to a primary statement that can be a
SELECT
, INSERT
, UPDATE
, DELETE
or MERGE
.
Select in WITH
The basic value of SELECT
in WITH
is to break down complicated queries
into simpler parts.
-- select top revenue months
WITH monthly_revenue AS (SELECT EXTRACT(YEAR FROM date) AS year,
EXTRACT(MONTH FROM date) AS month,
SUM(amount) AS total_amount
FROM revenue
GROUP BY year, month
ORDER BY year, month)
SELECT *
FROM monthly_revenue
WHERE total_amount > (SELECT SUM(total_amount) / 100 FROM monthly_revenue)
ORDER BY total_amount DESC;
Recursive queries
The optional RECURSIVE
modifier changes WITH from a mere syntactic
convenience into a feature that accomplishes things not otherwise possible in
standard SQL. Using RECURSIVE
, a WITH
query can refer to its own
output.
WITH RECURSIVE t(n) AS (VALUES (1) UNION SELECT n + 1 FROM t WHERE n < 10)
SELECT sum(n)
FROM t;
Important
When working with recursive queries it is important to be sure that the recursive part of the query will eventually return no tuple, or else the query will loop indefinitely.
Data modifying statements in WITH
You can use most data-modifying statements (INSERT
, UPDATE
, DELETE
)
in WITH
. This allows you to perform several different operations at a time,
in the same query.
WITH deleted_rows AS (
DELETE FROM revenue
WHERE date < '2020-01-01'
RETURNING *)
INSERT
INTO revenue_log
SELECT *
FROM deleted_rows;
Combining queries
The results of two queries can be combined using the set operations union, intersection, and difference.
query1 UNION [ALL] query2
query1 INTERSECT [ALL] query2
query1 EXCEPT [ALL] query2
UNION
effectively appends the result of query2
to the result of
query1
(although there is no guarantee that this is the order in which
the rows are actually returned). Furthermore, it eliminates duplicate rows from
its result, in the same way as DISTINCT
, unless UNION ALL
is used.
WITH north_lib_member AS
(SELECT *
FROM (VALUES (1, 'Mariska', 'Syson'),
(2, 'Robenia', 'Dimitriades'),
(3, 'Loren', 'Goretti'),
(4, 'Dianna', 'Giacomuzzo'),
(5, 'Anett', 'Hayland'),
(6, 'Keeley', 'Grishelyov')) AS tbl(id, first_name, last_name)),
south_lib_member AS
(SELECT *
FROM (VALUES (5, 'Anett', 'Hayland'),
(6, 'Keeley', 'Grishelyov'),
(7, 'Say', 'Kliner'),
(8, 'Charles', 'Ede'),
(9, 'Florentia', 'Core'),
(10, 'Tiffany', 'Tingly'))
AS tbl(id, first_name, last_name))
SELECT *
FROM north_lib_member
UNION
SELECT *
FROM south_lib_member
ORDER BY id;
id |
first_name |
last_name |
---|---|---|
1 |
Mariska |
Syson |
2 |
Robenia |
Dimitriades |
3 |
Loren |
Goretti |
4 |
Dianna |
Giacomuzzo |
5 |
Anett |
Hayland |
6 |
Keeley |
Grishelyov |
7 |
Say |
Kliner |
8 |
Charles |
Ede |
9 |
Florentia |
Core |
10 |
Tiffany |
Tingly |
INTERSECT
returns all rows that are both in the result of query1
and in
the result of query2
. Duplicate rows are eliminated unless
INTERSECT ALL
is used.
WITH north_lib_member AS
(SELECT *
FROM (VALUES (1, 'Mariska', 'Syson'),
(2, 'Robenia', 'Dimitriades'),
(3, 'Loren', 'Goretti'),
(4, 'Dianna', 'Giacomuzzo'),
(5, 'Anett', 'Hayland'),
(6, 'Keeley', 'Grishelyov')) AS tbl(id, first_name, last_name)),
south_lib_member AS
(SELECT *
FROM (VALUES (5, 'Anett', 'Hayland'),
(6, 'Keeley', 'Grishelyov'),
(7, 'Say', 'Kliner'),
(8, 'Charles', 'Ede'),
(9, 'Florentia', 'Core'),
(10, 'Tiffany', 'Tingly'))
AS tbl(id, first_name, last_name))
SELECT *
FROM north_lib_member
INTERSECT
SELECT *
FROM south_lib_member
ORDER BY id;
id |
first_name |
last_name |
---|---|---|
5 |
Anett |
Hayland |
6 |
Keeley |
Grishelyov |
EXCEPT
returns all rows that are in the result of query1
but not in
the result of query2
. (This is sometimes called the difference between two
queries.) Again, duplicates are eliminated unless EXCEPT ALL
is used.
WITH north_lib_member AS
(SELECT *
FROM (VALUES (1, 'Mariska', 'Syson'),
(2, 'Robenia', 'Dimitriades'),
(3, 'Loren', 'Goretti'),
(4, 'Dianna', 'Giacomuzzo'),
(5, 'Anett', 'Hayland'),
(6, 'Keeley', 'Grishelyov')) AS tbl(id, first_name, last_name)),
south_lib_member AS
(SELECT *
FROM (VALUES (5, 'Anett', 'Hayland'),
(6, 'Keeley', 'Grishelyov'),
(7, 'Say', 'Kliner'),
(8, 'Charles', 'Ede'),
(9, 'Florentia', 'Core'),
(10, 'Tiffany', 'Tingly'))
AS tbl(id, first_name, last_name))
SELECT *
FROM north_lib_member
EXCEPT
SELECT *
FROM south_lib_member
ORDER BY id;
id |
first_name |
last_name |
---|---|---|
1 |
Mariska |
Syson |
2 |
Robenia |
Dimitriades |
3 |
Loren |
Goretti |
4 |
Dianna |
Giacomuzzo |
Join data from different tables
A JOIN
clause combines two FROM
items, which for convenience we will
refer as “tables”, though in reality they can be any type of FROM
items.
Available join types are:
[INNER] JOIN
LEFT [OUTER] JOIN
RIGHT [OUTER] JOIN
FULL [OUTER] JOIN
Note
Key words in bracket parentheses above are optional.
Joins between tables
CREATE TABLE publisher
(
id INTEGER PRIMARY KEY,
name VARCHAR(128) NOT NULL CHECK (LENGTH(name) > 0),
website VARCHAR(255) UNIQUE,
email VARCHAR(255),
phone VARCHAR(32) UNIQUE
);
CREATE TABLE book
(
id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
synopsis TEXT,
isbn VARCHAR(16),
publisher_id INTEGER NOT NULL REFERENCES publisher ON DELETE RESTRICT,
publication_date DATE,
language VARCHAR(64),
page_count INTEGER,
keywords TEXT
);
Following tables are related via FOREIGN KEY
constraint. Each record in
the book
table references some record in the publisher
table.
Important
Since original tables contain lots of data, temporary tables will be
created instead before join query. Full tables content may be downloaded
via links:
publisher
,
book
.
The subsets from original queries defined as:
SELECT *
FROM (VALUES (1, 'Zoonoodle'),
(3, 'Tanoodle'),
(4, 'Skivee'))
AS publisher(id, name);
id |
name |
---|---|
1 |
Zoonoodle |
3 |
Tanoodle |
SELECT *
FROM (VALUES (22, 'pretium nisl', 2),
(38, 'ac est lacinia nisi venenatis', 3),
(69, 'vel accumsan tellus nisi', 3),
(72, 'nam dui proin', 2),
(85, 'nam tristique', 2))
AS book(id, title, publisher_id);
id |
title |
publisher_id |
---|---|---|
22 |
pretium nisl |
2 |
38 |
ac est lacinia nisi venenatis |
3 |
69 |
vel accumsan tellus nisi |
3 |
72 |
nam dui proin |
2 |
85 |
nam tristique |
2 |
By passing multiple tables to the FROM
clause a CROSS JOIN
query is
executed. This means the resulting table is the Cartesian product of two
original tables.
WITH publisher AS
(SELECT *
FROM (VALUES (1, 'Zoonoodle'),
(3, 'Tanoodle'),
(4, 'Skivee'))
AS publisher(id, name)),
book AS
(SELECT *
FROM (VALUES (22, 'pretium nisl', 2),
(38, 'ac est lacinia nisi venenatis', 3),
(69, 'vel accumsan tellus nisi', 3),
(72, 'nam dui proin', 2),
(85, 'nam tristique', 2))
AS book(id, title, publisher_id))
-- querying the joined result
SELECT *
FROM book,
publisher;
id |
title |
publisher_id |
id |
name |
---|---|---|---|---|
22 |
pretium nisl |
2 |
1 |
Zoonoodle |
22 |
pretium nisl |
2 |
3 |
Tanoodle |
22 |
pretium nisl |
2 |
4 |
Skivee |
38 |
ac est lacinia nisi venenatis |
3 |
1 |
Zoonoodle |
38 |
ac est lacinia nisi venenatis |
3 |
3 |
Tanoodle |
38 |
ac est lacinia nisi venenatis |
3 |
4 |
Skivee |
69 |
vel accumsan tellus nisi |
3 |
1 |
Zoonoodle |
69 |
vel accumsan tellus nisi |
3 |
3 |
Tanoodle |
69 |
vel accumsan tellus nisi |
3 |
4 |
Skivee |
72 |
nam dui proin |
2 |
1 |
Zoonoodle |
72 |
nam dui proin |
2 |
3 |
Tanoodle |
72 |
nam dui proin |
2 |
4 |
Skivee |
85 |
nam tristique |
2 |
1 |
Zoonoodle |
85 |
nam tristique |
2 |
3 |
Tanoodle |
85 |
nam tristique |
2 |
4 |
Skivee |
By defining the WHERE
clause rows that do not match the condition may
be eliminated from the result.
SELECT *
FROM book, publisher
WHERE book.publisher_id = publisher.id;
id |
title |
publisher_id |
id |
name |
---|---|---|---|---|
38 |
ac est lacinia nisi venenatis |
3 |
3 |
Tanoodle |
69 |
vel accumsan tellus nisi |
3 |
3 |
Tanoodle |
This is called INNER JOIN
, which means that the result table will contain
only whose rows from left-hard and right-hand tables, that have corresponding
records to each other.
The same result may be obtained using JOIN
clause. All JOIN
options are
just a notation convenience, since they do nothing you couldn’t do with plain
FROM
and WHERE
.
SELECT *
FROM book b
JOIN publisher p ON b.publisher_id = p.id;
INNER JOIN
is the default while joining two tables with each other.
Right and left joins
LEFT OUTER JOIN
returns all rows in the qualified Cartesian product
(i.e., all combined rows that pass its condition), plus one copy of each
row in the left-hand table for which there was no right-hand row that passed
the join condition.
SELECT *
FROM book b
LEFT JOIN publisher p ON b.publisher_id = p.id;
id |
title |
publisher_id |
id |
name |
---|---|---|---|---|
22 |
pretium nisl |
2 |
null |
null |
38 |
ac est lacinia nisi venenatis |
3 |
3 |
Tanoodle |
69 |
vel accumsan tellus nisi |
3 |
3 |
Tanoodle |
72 |
nam dui proin |
2 |
null |
null |
85 |
nam tristique |
2 |
null |
null |
Conversely, RIGHT OUTER JOIN
returns all the joined rows, plus one row for
each unmatched right-hand row. This is just a notation convenience, since you
could convert it to a LEFT OUTER JOIN
by switching the left and right
tables.
SELECT *
FROM book b
RIGHT JOIN publisher p ON b.publisher_id = p.id;
id |
title |
publisher_id |
id |
name |
---|---|---|---|---|
38 |
ac est lacinia nisi venenatis |
3 |
3 |
Tanoodle |
69 |
vel accumsan tellus nisi |
3 |
3 |
Tanoodle |
null |
null |
null |
4 |
Skivee |
null |
null |
null |
1 |
Zoonoodle |
Full outer join
SELECT *
FROM book b
FULL JOIN publisher p ON b.publisher_id = p.id;
id |
title |
publisher_id |
id |
name |
---|---|---|---|---|
22 |
pretium nisl |
2 |
null |
null |
38 |
ac est lacinia nisi venenatis |
3 |
3 |
Tanoodle |
69 |
vel accumsan tellus nisi |
3 |
3 |
Tanoodle |
72 |
nam dui proin |
2 |
null |
null |
85 |
nam tristique |
2 |
null |
null |
null |
null |
null |
4 |
Skivee |
null |
null |
null |
1 |
Zoonoodle |