Error
Error Code:
22021
PostgreSQL Error 22021: Invalid Character Encoding
Description
The 'character not in repertoire' error (22021) in PostgreSQL indicates that a character in the data being processed cannot be represented by the current character set encoding. This typically happens when data containing characters from one encoding (e.g., UTF-8) is inserted into a database or column configured for a different, more restrictive encoding (e.g., LATIN1), or vice-versa.
Error Message
character not in repertoire
Known Causes
3 known causesDatabase Encoding Mismatch
The character set of the data being inserted or processed does not match the encoding configured for the PostgreSQL database or a specific column.
Client Connection Encoding
The character set encoding of the client application connecting to PostgreSQL differs from the database's encoding, leading to incorrect character interpretation.
Invalid Character Data
Input data contains characters that are corrupted, malformed, or simply not valid for *any* character set, making them unrepresentable.
Solutions
3 solutions available1. Identify and Remove Problematic Characters easy
Locate and remove specific characters in your data that are causing the encoding issue.
1
Connect to your PostgreSQL database using `psql` or your preferred client.
psql -U your_user -d your_database -h your_host
2
Identify the table and column(s) where the error is occurring. You might need to inspect the application logs or the exact query that produced the error.
SELECT * FROM your_table WHERE your_column ~ '[^\x00-\x7F]' LIMIT 10;
3
If the above query returns results, it means there are non-ASCII characters. You can try to remove or replace them. For example, to remove all non-ASCII characters from a specific column:
UPDATE your_table SET your_column = regexp_replace(your_column, '[^\x00-\x7F]+', '', 'g');
4
Alternatively, if you suspect specific problematic characters, you can try to identify them first and then remove them individually. For instance, to remove a specific character (e.g., a smart quote):
UPDATE your_table SET your_column = REPLACE(your_column, '’', '');
5
Commit the changes to make them permanent.
COMMIT;
2. Correct Client Encoding Settings easy
Ensure your client application is sending data with the correct encoding that your PostgreSQL server expects.
1
Determine the encoding of your PostgreSQL server. This is typically set during initialization and can be found in `postgresql.conf` (look for `server_encoding`) or by running the following SQL query:
SHOW server_encoding;
2
Configure your client application (e.g., programming language driver, ETL tool, command-line client) to use the same encoding as the server. For example, if your server is UTF-8, ensure your client is also configured to use UTF-8.
For Python with psycopg2: `conn = psycopg2.connect(database='your_db', user='your_user', password='your_password', host='your_host', port='5432', options='-c client_encoding=UTF8')`
For psql: `export PGCLIENTENCODING=UTF8`
3
Re-run the operation that caused the error. If the client encoding was the issue, this should resolve it.
N/A
3. Convert Data to a Compatible Encoding medium
If your data contains characters that are not supported by the current database encoding, convert the data to a more capable encoding like UTF-8.
1
Check your current database encoding:
SHOW server_encoding;
2
If your `server_encoding` is not UTF-8 (e.g., LATIN1), and you have data with characters outside of that set, consider migrating to UTF-8. This is a more involved process and requires careful planning.
N/A
3
To convert existing data in a column to UTF-8, you can use `convert_from` and `convert_to` functions. This example assumes you have a column `your_column` with data in `LATIN1` and you want to store it as `UTF8` after cleaning.
ALTER TABLE your_table ALTER COLUMN your_column TYPE text USING convert_from(convert_to(your_column, 'LATIN1'), 'UTF8');
4
If you need to change the entire database encoding, it's often best to create a new database with the desired encoding (UTF-8) and dump/restore your data. This is a more robust solution for a permanent fix.
1. Create a new database: `CREATE DATABASE new_db WITH ENCODING 'UTF8' LC_COLLATE='en_US.UTF-8' LC_CTYPE='en_US.UTF-8' TEMPLATE template0;`
2. Dump the old database: `pg_dump -U your_user old_db > old_db_dump.sql`
3. Restore to the new database: `psql -U your_user -d new_db < old_db_dump.sql`