#

Thursday, April 11, 2013

MySQL, UTF-8, Java - Data Imports

I had a terrible time translating my UTF-8 data which I wanted to import from an backend process into my webapp and display it as is without any hiccups. Turns out, this is one of those things if you get right, you will never know but if you dont .. then you are perhaps going to wanna write a blog like this one :)

So in my endeavor, I did some research and found posts like : Character Encoding UTF-8 with JPA/Hibernate, MySql and Tomcat

While the stuff there is apt, it does not hit the problem I had. Which was not just about your data but to ensure you import your data in a pure UTF-8.
Turns out no my.cnf and no mysqld param stuff was required. Neither was any complicated JDBC connection string required like &useUnicode=true&characterEncoding=UTF-8

Only the following steps need to be kept in mind:
  1. Ensure clean Data import:  This is the most critical step. NEVER copy/paste your data. Ensure the file that contains your data is saved in the proper encoded format, in our case UTF-8
  2. Ensure Schema and Table definitions follow the same encoding:  
    create database my_schema_name character set utf8 DEFAULT COLLATE utf8_general_ci;
  3. Import via prompt:  
    mysql -u root -p -h localhost dbname --default-character-set=utf8 < filename.sql


Some other Misc notes:
  • If you decide to use connection params in your JDBC string to MySQL, use UTF-8 instead of utf8. Refer to MySQL Char Set Ref for details
  • If you are using the String directly in an XML use CDATA or replace & with &amp;. If you import them say from a properties file into a Spring config, ignore this.
  • When rendering your HTML ensure the page character set supports UTF-8 and further you may convert text by using a function like:
    function replaceEncodedChars(elementHTMLId) {
       $(elementHTMLId).each(
        function( intIndex ){
         $(this).attr('innerHTML', $(this).text());
        }
       );
    };
    
    Sample Usage: