UTF-8 encoded data in PHP and MySQL

Since every developer has probably struggled with data encoding once in their career, a short summary “how to get your data in and out of a MySQL database with PHP”:
- When you create your MySQL database, set the collation to one of the UTF-8 options (e.g.: utf8_general_ci). If you have no rights to create or change your database, do this on the table level.
Note: if you use phpMyAdmin, it’s good to set your “session” encoding to UTF-8 before you sign in (although I believe this is now the default). Otherwise perfectly good data could look weird.
- Secondly, when you create a database connection in PHP, be sure to set the character set of your connection to UTF-8 by executing
SET NAMES 'utf8'(notice the missing dash). Some extensions/adapters may have a method to do this (e.g. PDO) or allow you to pass it as an option.
- Third, save your PHP script/file as UTF-8 no-BOM (no byte-order mark since UTF-8 does not have byte order issues).
- Last, but most important & applicable to many more projects: set the charset of your page through the HTTP header. e.g.:
header('Content-Type: text/html;charset=utf-8');Note: do not use the <meta> tag (solely) for this. Only using the meta tag causes most browsers to start re-parsing your content once they notice it’s UTF-8 instead of their default assumption. This only slows down page loading.
This post is certainly not meant to be a Unicode manual but it can be a quick reference when you every wonder why you see those strange characters in your database.


23. April 2009 at 14:43
[...] http://www.strictlyphp.com/blog/2009/01/20/utf-8-encoded-data-in-php-and-mysql/ [...]