UTF-8 encoded data in PHP and MySQL

Since every developer has probably struggled with data encoding once in their career, a short summary “how to get your data in and out of a MySQL database with PHP”:
- When you create your MySQL database, set the collation to one of the UTF-8 options (e.g.: utf8_general_ci). If you have no rights to create or change your database, do this on the table level.
Note: if you use phpMyAdmin, it’s good to set your “session” encoding to UTF-8 before you sign in (although I believe this is now the default). Otherwise perfectly good data could look weird.
- Secondly, when you create a database connection in PHP, be sure to set the character set of your connection to UTF-8 by executing
1<span class="nodeLabelBox repTarget"><span class="nodeAttr editGroup"><span class="nodeValue editable">SET NAMES 'utf8'</span></span></span>
(notice the missing dash). Some extensions/adapters may have a method to do this (e.g. PDO) or allow you to pass it as an option. - Third, save your PHP script/file as UTF-8 no-BOM (no byte-order mark since UTF-8 does not have byte order issues).
- Last, but most important & applicable to many more projects: set the charset of your page through the HTTP header. e.g.:
1<span class="nodeLabelBox repTarget"><span class="nodeAttr editGroup"><span class="nodeValue editable">header('Content-Type: text/html;charset=utf-8');</span></span></span>
Note: do not use the <meta> tag (solely) for this. Only using the meta tag causes most browsers to start re-parsing your content once they notice it’s UTF-8 instead of their default assumption. This only slows down page loading.
This post is certainly not meant to be a Unicode manual but it can be a quick reference when you every wonder why you see those strange characters in your database.




