📋

Description

The 'cardinality violation' error indicates that an SQL statement, often involving a subquery or a function, has returned more rows or values than the context expects. This typically arises when a scalar subquery (expected to return one value) or a subquery in a WHERE clause (expected to match one row) yields multiple results.

💬

Error Message

cardinality violation

🔍

Known Causes

3 known causes

⚠️

Scalar Subquery Returns Multiple Rows

A subquery designed to produce a single value or row unexpectedly returns multiple results, violating the scalar context.

⚠️

WHERE Clause Expects Single Match

A subquery within a WHERE clause, particularly with comparison operators like '=', returns multiple rows when only a single result is anticipated.

⚠️

Aggregate Function Misuse

An aggregate function is used in a context that expects a single result, but it processes multiple rows without an appropriate GROUP BY clause or LIMIT.

🛠️

Solutions

4 solutions available

1. Use EXISTS Instead of SELECT in WHERE Clause easy

Replace a subquery in a WHERE clause that returns multiple rows with the EXISTS operator for a boolean check.

1

Identify the subquery in your WHERE clause that is causing the cardinality violation. This subquery is likely comparing a value to the result of the subquery.

2

Rewrite the query to use the `EXISTS` operator. `EXISTS` returns true if the subquery returns at least one row, and false otherwise. This is suitable when you only need to check for the presence of related records.

-- Original query (problematic):
SELECT * FROM table_a a
WHERE a.column_x = (SELECT column_y FROM table_b b WHERE b.column_z = a.column_id);

-- Rewritten query using EXISTS:
SELECT * FROM table_a a
WHERE EXISTS (SELECT 1 FROM table_b b WHERE b.column_z = a.column_id);

2. Use a Scalar Subquery with Aggregate Functions medium

If the subquery is intended to return a single value (e.g., count, sum, average), ensure it uses an aggregate function and a `LIMIT 1` or `GROUP BY` clause if necessary to guarantee a single row result.

1

Examine the subquery. If its purpose is to return a single aggregated value, confirm that an aggregate function (like `COUNT()`, `SUM()`, `AVG()`, `MAX()`, `MIN()`) is being used.

2

If the subquery might still return multiple rows even with an aggregate function (e.g., if grouping is missing or incorrect), add a `GROUP BY` clause to ensure aggregation results in a single row for each group, or a `LIMIT 1` if you only want an arbitrary single value from the potential multiple results.

-- Example where a single row is expected for each 'a.column_id'
-- Original subquery might return multiple rows if 'column_z' is not unique for a given 'a.column_id'

-- Rewritten subquery with aggregation and LIMIT 1:
SELECT * FROM table_a a
WHERE a.column_x = (
    SELECT MAX(column_y) -- Or any other aggregate function
    FROM table_b b
    WHERE b.column_z = a.column_id
    LIMIT 1 -- Ensures only one value is returned, even if MAX is not truly unique
);

3. Use ANY or ALL for Set Comparisons medium

When comparing a value to a set of values returned by a subquery, use `ANY` or `ALL` to correctly handle multi-row results.

1

Identify if your subquery in the `WHERE` clause is intended to check if a value matches *any* of the values in the subquery's result set, or if it must match *all* of them.

2

Rewrite the comparison using `ANY` (equivalent to `= ANY`) if the value should match at least one row, or `ALL` if it must match all rows in the subquery's result.

-- Original query (problematic):
SELECT * FROM table_a a
WHERE a.column_x = (SELECT column_y FROM table_b b WHERE b.column_z = 'some_value');

-- Rewritten query using ANY:
-- Checks if a.column_x is equal to ANY of the values returned by the subquery
SELECT * FROM table_a a
WHERE a.column_x = ANY (SELECT column_y FROM table_b b WHERE b.column_z = 'some_value');

-- Rewritten query using ALL:
-- Checks if a.column_x is equal to ALL of the values returned by the subquery (less common)
SELECT * FROM table_a a
WHERE a.column_x = ALL (SELECT column_y FROM table_b b WHERE b.column_z = 'some_value');

4. Join Tables Instead of Using Scalar Subqueries medium

Convert subqueries in the `SELECT` list or `WHERE` clause to a `JOIN` for better performance and clarity when multiple rows are expected or can be handled relationally.

1

Analyze the subquery. If it's in the `SELECT` list and is intended to fetch related data, or in the `WHERE` clause to filter based on related data, consider if a `JOIN` would be more appropriate.

2

Rewrite the query using an appropriate `JOIN` type (`INNER JOIN`, `LEFT JOIN`, etc.) to combine the tables based on their relationship. This is often more performant than correlated subqueries.

-- Original query (problematic, subquery in SELECT):
SELECT
    a.id,
    (SELECT b.name FROM table_b b WHERE b.a_id = a.id LIMIT 1) AS b_name
FROM table_a a;

-- Rewritten query using LEFT JOIN:
SELECT
    a.id,
    b.name AS b_name
FROM table_a a
LEFT JOIN table_b b ON b.a_id = a.id;

-- Original query (problematic, subquery in WHERE returning multiple rows):
SELECT * FROM table_a a
WHERE a.id IN (SELECT b.a_id FROM table_b b WHERE b.status = 'active');

-- Rewritten query using INNER JOIN:
SELECT DISTINCT a.*
FROM table_a a
INNER JOIN table_b b ON a.id = b.a_id
WHERE b.status = 'active';

🔗

Related Errors

5 related errors

PostgreSQL Error 21000: Subquery Returns Multiple Rows