UTF-8 encoded data in PHP and MySQL

unicode

Since every developer has probably struggled with data encoding once in their career, a short summary “how to get your data in and out of a MySQL database with PHP”:

  • When you create your MySQL database, set the collation to one of the UTF-8 options (e.g.: utf8_general_ci). If you have no rights to create or change your database, do this on the table level.
    Note: if you use phpMyAdmin, it’s good to set your “session” encoding to UTF-8 before you sign in (although I believe this is now the default). Otherwise perfectly good data could look weird.
  • Secondly, when you create a database connection in PHP, be sure to set the character set of your connection to UTF-8 by executing

    SET NAMES 'utf8'

    (notice the missing dash). Some extensions/adapters may have a method to do this (e.g. PDO) or allow you to pass it as an option.

  • Third, save your PHP script/file as UTF-8 no-BOM (no byte-order mark since UTF-8 does not have byte order issues).
  • Last, but most important & applicable to many more projects: set the charset of your page through the HTTP header. e.g.:

    header('Content-Type: text/html;charset=utf-8');

    Note: do not use the <meta> tag (solely) for this.  Only using the meta tag causes most browsers to start re-parsing your content once they notice it’s UTF-8 instead of their default assumption. This only slows down page loading.

This post is certainly not meant to be a Unicode manual but it can be a quick reference when you every wonder why you see those strange characters in your database.