Error
Error Code: 22021

PostgreSQL Error 22021: Invalid Character Encoding

📦 PostgreSQL
📋

Description

The 'character not in repertoire' error (22021) in PostgreSQL indicates that a character in the data being processed cannot be represented by the current character set encoding. This typically happens when data containing characters from one encoding (e.g., UTF-8) is inserted into a database or column configured for a different, more restrictive encoding (e.g., LATIN1), or vice-versa.
💬

Error Message

character not in repertoire
🔍

Known Causes

3 known causes
⚠️
Database Encoding Mismatch
The character set of the data being inserted or processed does not match the encoding configured for the PostgreSQL database or a specific column.
⚠️
Client Connection Encoding
The character set encoding of the client application connecting to PostgreSQL differs from the database's encoding, leading to incorrect character interpretation.
⚠️
Invalid Character Data
Input data contains characters that are corrupted, malformed, or simply not valid for *any* character set, making them unrepresentable.
🛠️

Solutions

3 solutions available

1. Identify and Remove Problematic Characters easy

Locate and remove specific characters in your data that are causing the encoding issue.

1
Connect to your PostgreSQL database using `psql` or your preferred client.
psql -U your_user -d your_database -h your_host
2
Identify the table and column(s) where the error is occurring. You might need to inspect the application logs or the exact query that produced the error.
SELECT * FROM your_table WHERE your_column ~ '[^\x00-\x7F]' LIMIT 10;
3
If the above query returns results, it means there are non-ASCII characters. You can try to remove or replace them. For example, to remove all non-ASCII characters from a specific column:
UPDATE your_table SET your_column = regexp_replace(your_column, '[^\x00-\x7F]+', '', 'g');
4
Alternatively, if you suspect specific problematic characters, you can try to identify them first and then remove them individually. For instance, to remove a specific character (e.g., a smart quote):
UPDATE your_table SET your_column = REPLACE(your_column, '’', '');
5
Commit the changes to make them permanent.
COMMIT;

2. Correct Client Encoding Settings easy

Ensure your client application is sending data with the correct encoding that your PostgreSQL server expects.

1
Determine the encoding of your PostgreSQL server. This is typically set during initialization and can be found in `postgresql.conf` (look for `server_encoding`) or by running the following SQL query:
SHOW server_encoding;
2
Configure your client application (e.g., programming language driver, ETL tool, command-line client) to use the same encoding as the server. For example, if your server is UTF-8, ensure your client is also configured to use UTF-8.
For Python with psycopg2: `conn = psycopg2.connect(database='your_db', user='your_user', password='your_password', host='your_host', port='5432', options='-c client_encoding=UTF8')`
For psql: `export PGCLIENTENCODING=UTF8`
3
Re-run the operation that caused the error. If the client encoding was the issue, this should resolve it.
N/A

3. Convert Data to a Compatible Encoding medium

If your data contains characters that are not supported by the current database encoding, convert the data to a more capable encoding like UTF-8.

1
Check your current database encoding:
SHOW server_encoding;
2
If your `server_encoding` is not UTF-8 (e.g., LATIN1), and you have data with characters outside of that set, consider migrating to UTF-8. This is a more involved process and requires careful planning.
N/A
3
To convert existing data in a column to UTF-8, you can use `convert_from` and `convert_to` functions. This example assumes you have a column `your_column` with data in `LATIN1` and you want to store it as `UTF8` after cleaning.
ALTER TABLE your_table ALTER COLUMN your_column TYPE text USING convert_from(convert_to(your_column, 'LATIN1'), 'UTF8');
4
If you need to change the entire database encoding, it's often best to create a new database with the desired encoding (UTF-8) and dump/restore your data. This is a more robust solution for a permanent fix.
1. Create a new database: `CREATE DATABASE new_db WITH ENCODING 'UTF8' LC_COLLATE='en_US.UTF-8' LC_CTYPE='en_US.UTF-8' TEMPLATE template0;`
2. Dump the old database: `pg_dump -U your_user old_db > old_db_dump.sql`
3. Restore to the new database: `psql -U your_user -d new_db < old_db_dump.sql`
🔗

Related Errors

5 related errors