User Tools

Site Tools


mysql:configure_utf8mb4

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
mysql:configure_utf8mb4 [2016/07/03 10:06] petermysql:configure_utf8mb4 [2020/07/15 09:30] (current) – external edit 127.0.0.1
Line 5: Line 5:
 These instructions show how to convert the entire database to be **utf8mb4**. These instructions show how to convert the entire database to be **utf8mb4**.
  
-Back up your data first.+**IMPORTANT:**
  
-===== Update Database =====+The following instructions assume that a case-sensitive database is required throughout.  However it is easy to have any part of the instructions be set to support case-insensitive text.  Simply:
  
-Update the default character set and collation for your database.+  * For case-sensitive text set the character set to **utf8mb4_bin**. 
 +  * For case-insensitive text, the character set can be set to **utf8mb4_unicode_ci** instead.
  
-For case-sensitive text set the character set to **utf8mb4_bin**. 
  
-<code mysql> 
-ALTER DATABASE `mydatabase` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_bin; 
-</code> 
  
 +===== Back up your data =====
  
-For case-insensitive text, then the character set can be set to **utf8mb4_unicode_ci** instead.+Before making any change to a database it is always recommended that a full backup of the database is taken. 
 + 
 + 
 +===== Update Database ===== 
 + 
 +Update the default character set and collation for your database.
  
 <code mysql> <code mysql>
-ALTER DATABASE `mydatabase` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;+ALTER DATABASE `mydatabase` CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
 </code> </code>
- + 
  
 ===== Update Tables ===== ===== Update Tables =====
Line 30: Line 34:
  
 <code mysql> <code mysql>
-ALTER TABLE `mytable1` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_bin;+ALTER TABLE `mytable1` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
 </code> </code>
   
- 
- 
 ===== Update Table Columns ===== ===== Update Table Columns =====
  
Line 40: Line 42:
  
 <code mysql> <code mysql>
-ALTER TABLE `mytable1` CHANGE `mycolumn` `mycolumn` VARCHAR(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_bin NOT NULL DEFAULT  '';+ALTER TABLE `mytable1` CHANGE `mycolumn` `mycolumn` VARCHAR(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT  '';
 </code> </code>
  
-  +Ensure you convert to the same column type and length, etc, the above is for example only and if you copy/paste it, you may screw up your column schema.  Essentially you're just using ALTER TABLE CHANGE on the column in order to change the character set to **utf8mb4** and collation to **utf8mb4_bin**.
-Now obviously you're going to want to make sure that you're converting to the same column type and length, etc, the above is for example only and if you copy/paste it, you may screw up your column schema.  Essentially you're just using ALTER TABLE CHANGE on the column in order to change the character set to utf8mb4 and collation to utf8mb4_unicode_bin.+
  
 If on the other hand you just want to change the entire the entire table at once, you can do: If on the other hand you just want to change the entire the entire table at once, you can do:
  
 <code mysql> <code mysql>
-ALTER TABLE `mytable1` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_bin;+ALTER TABLE `mytable1` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
 </code> </code>
   
Line 70: Line 71:
 If you're using InnoDB on MySQL 5.6.3 or higher you can enable **innodb_large_prefix** in your MySQL config file (more information in manual http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_large_prefix), but if you aren't you can take a few steps to work it out the old way: If you're using InnoDB on MySQL 5.6.3 or higher you can enable **innodb_large_prefix** in your MySQL config file (more information in manual http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_large_prefix), but if you aren't you can take a few steps to work it out the old way:
  
-  Make note of the conflicting index, it's going to likely be one which is something like VARCHAR(255) or an index across multiple columns which includes VARCHAR.  Make note of the index name, type, and which column(s) it crosses. +  Make note of the conflicting index, it's going to likely be one which is something like VARCHAR(255) or an index across multiple columns which includes VARCHAR.  Make note of the index name, type, and which column(s) it crosses. 
-  - In my own scenario, I had a lot of columns which included some sort of VARCHAR(254) and ID which was binary(20).  Now it seems like 254+20 = 274, and hey that's less than 767 (or 1000) so what's the deal?  Well, MySQL doesn't count literal bytes in VARCHAR when it comes to Unicoderather potential Unicode bytes are themselves counted as a byte (wait, what?).  So if the column is 254 and it's utf8 that means the actual potential length is literally (254 * 3) bytes, and with utf8mb4 it's (254 * 4).  So really the length of the key you're trying to create is ((254 * 4) + 20).  InnoDB only allows a maximum of 255 bytes for the column in an index with utf8 and 191 bytes for utf8mb4 So if you need the entire column indexed, you aren't going to want to change the character set for that column(s), and instead I recommend changing all others one by one (as seen in the Table Columns section) rather than trying to convert the entire table. However if you do not need the entire column to be index, and in certain cases I did not.  Drop the index:+ 
 +  * utf8mb4 uses 4 bytes to store every charactertherefore the maximum length of an index can be ~767/4=191.
  
 <code mysql> <code mysql>
Line 77: Line 79:
 </code> </code>
         
- 
 Then recreate it with the offending column(s) limited to 191: Then recreate it with the offending column(s) limited to 191:
  
Line 94: Line 95:
  
  
 +How does one find the offending index?  SHOW INDEXES FROM foo; will show all indexes on the table.  Combine that with DESCRIBE foo; and you can figure out which indexes are on columns longer than 191 characters.  With that out of the way, back to the action:
 +
 +<code mysql>
 +mysql> DROP INDEX foo_1234 on foo;
 +Query OK, 0 rows affected (0.00 sec)
 +Records: 0  Duplicates: 0  Warnings: 0
 +mysql> ALTER TABLE foo CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
 +Query OK, 0 rows affected (0.11 sec)
 +Records: 0  Duplicates: 0  Warnings: 0
 +CREATE INDEX foo_1234 ON foo (`baz`(191));
 +</code>
 +
 +===== Additional Notes and Considerations =====
 +
 +If a column is not being used for search and case insensitivity isn't an issue, instead of using CHAR or VARCHAR, I suggest using BINARY and VARBINARY.  Not only is comparison vastly faster, but also there's less to worry about as far as character set issues go, i.e. they don't matter.  Further also VARBINARY is literal length so the UTF-8 limitations described in the index section of this post do not apply, so you can get the full width for your index.
 +
 +Additionally instead of using TEXT, use BLOB, for the same reasons, but also realize the same limitations apply, such as no fulltext searching.
 +
 +In summation, if you don't need case sensitivity and you don't need fulltext search, consider BINARY, VARBINARY, and BLOB over CHAR, VARCHAR, and TEXT, it'll be a lot easier to deal with when it comes to Unicode.
 +
 +You can learn more about this on my MySQL performance, using case insensitive columns post.
 +
 +
 +
 +===== Database Connections =====
 +
 +Depending on your programming language, you may need to specify when connecting which chartype to use (you can also, in most cases, specify this on configuration, see that section at the bottom), this usually can be done by sending this query  right after connection:
 +
 +<code mysql>
 +SET NAMES utf8mb4;
 +</code>
 +
 +
 +
 +===== Configuration =====
 +
 +You can edit your my.cnf (or my.ini on Windows) and make these changes to the appropriate sections of the configuration file (applicable to MySQL 5.6, older versions may need adjusted configuration):
 +
 +<code mysql>
 +[client]
 +default-character-set = utf8mb4
 + 
 +[mysql]
 +default-character-set = utf8mb4
 + 
 +[mysqld]
 +character-set-client-handshake = FALSE
 +character-set-server = utf8mb4
 +collation-server = utf8mb4_bin
 +</code>
mysql/configure_utf8mb4.1467540373.txt.gz · Last modified: 2020/07/15 09:30 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki