User Tools

Site Tools


mysql:configure_utf8mb4

MySQL - Configure utf8mb4

UTF-8 in MySQL isn't fully supported by the utf8 character set, it only supports a maximum of 3-byte characters. To have full UTF-8 support in MySQL then the utf8mb4 character set needs to be used.

These instructions show how to convert the entire database to be utf8mb4.

IMPORTANT:

The following instructions assume that a case-sensitive database is required throughout. However it is easy to have any part of the instructions be set to support case-insensitive text. Simply:

  • For case-sensitive text set the character set to utf8mb4_bin.
  • For case-insensitive text, the character set can be set to utf8mb4_unicode_ci instead.

Back up your data

Before making any change to a database it is always recommended that a full backup of the database is taken.

Update Database

Update the default character set and collation for your database.

ALTER DATABASE `mydatabase` CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

Update Tables

Change the default character set of the table. This way when you add new columns in the future you don't need to worry about adding all of the character set specification.

ALTER TABLE `mytable1` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

Update Table Columns

Now, you can convert one column at a time, and this may be what you wish to do if you require different character sets for your CHAR, VARCHAR, and TEXT columns, here's how you do that:

ALTER TABLE `mytable1` CHANGE `mycolumn` `mycolumn` VARCHAR(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT  '';

Ensure you convert to the same column type and length, etc, the above is for example only and if you copy/paste it, you may screw up your column schema. Essentially you're just using ALTER TABLE CHANGE on the column in order to change the character set to utf8mb4 and collation to utf8mb4_bin.

If on the other hand you just want to change the entire the entire table at once, you can do:

ALTER TABLE `mytable1` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

Update Table Indices

When changing the character type, you may run into this on InnoDB:

ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes

or this on MyISAM:

ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes

If you're using InnoDB on MySQL 5.6.3 or higher you can enable innodb_large_prefix in your MySQL config file (more information in manual http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_large_prefix), but if you aren't you can take a few steps to work it out the old way:

  • Make note of the conflicting index, it's going to likely be one which is something like VARCHAR(255) or an index across multiple columns which includes VARCHAR. Make note of the index name, type, and which column(s) it crosses.
  • utf8mb4 uses 4 bytes to store every character, therefore the maximum length of an index can be ~767/4=191.
ALTER TABLE `mytable1` DROP INDEX `theindex`;

Then recreate it with the offending column(s) limited to 191:

ALTER TABLE `mytable1` ADD INDEX `theindex` (`mycolumn` (191));

or if across multiple columns (assuming mycolumnb is not utf8 for example):

ALTER TABLE `mytable1` ADD INDEX `theindex` (`mycolumn` (191),`mycolumnb`);

As long as the indices are the same, and in the same column order, you should receive the same benefits for the indices without worrying about redoing your queries.

How does one find the offending index? SHOW INDEXES FROM foo; will show all indexes on the table. Combine that with DESCRIBE foo; and you can figure out which indexes are on columns longer than 191 characters. With that out of the way, back to the action:

mysql> DROP INDEX foo_1234 on foo;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> ALTER TABLE foo CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
Query OK, 0 rows affected (0.11 sec)
Records: 0  Duplicates: 0  Warnings: 0
CREATE INDEX foo_1234 ON foo (`baz`(191));

Additional Notes and Considerations

If a column is not being used for search and case insensitivity isn't an issue, instead of using CHAR or VARCHAR, I suggest using BINARY and VARBINARY. Not only is comparison vastly faster, but also there's less to worry about as far as character set issues go, i.e. they don't matter. Further also VARBINARY is literal length so the UTF-8 limitations described in the index section of this post do not apply, so you can get the full width for your index.

Additionally instead of using TEXT, use BLOB, for the same reasons, but also realize the same limitations apply, such as no fulltext searching.

In summation, if you don't need case sensitivity and you don't need fulltext search, consider BINARY, VARBINARY, and BLOB over CHAR, VARCHAR, and TEXT, it'll be a lot easier to deal with when it comes to Unicode.

You can learn more about this on my MySQL performance, using case insensitive columns post.

Database Connections

Depending on your programming language, you may need to specify when connecting which chartype to use (you can also, in most cases, specify this on configuration, see that section at the bottom), this usually can be done by sending this query right after connection:

SET NAMES utf8mb4;

Configuration

You can edit your my.cnf (or my.ini on Windows) and make these changes to the appropriate sections of the configuration file (applicable to MySQL 5.6, older versions may need adjusted configuration):

[client]
default-character-set = utf8mb4
 
[mysql]
default-character-set = utf8mb4
 
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_bin
mysql/configure_utf8mb4.txt · Last modified: 2020/07/15 10:30 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki